The AI data-poisoning cat-and-mouse game — this time, IT will win

The IT community of late has been freaking out about AI data poisoning. For some, it’s a sneaky mechanism that could act as a backdoor into enterprise systems by surreptitiously infecting the data large language models (LLMs) train on and then getting pulled into enterprise systems. For others, it’s a way to combat LLMs that try to do an end run around trademark and copyright protections.

Put simply, these two fears amount to data poisoning being either 1) an attack tool for cyberthieves and cyberterrorists or 2) a defense tool by artists and enterprises trying to protect their intellectual property.

In reality, AI data poisoning is not much of a threat in either scenario — but IT folk do very much love to freak out.

It’s the defense tactic that is getting a lot of attention these days, with people downloading a pair of freeware apps from the University of Chicago called Nightshade and Glaze.

These kinds of defensive data poisoning apps work by manipulating the targeted file to trick the LLM training function. With Nightshade, it typically manipulates the code around an image. The image might be a desert scene with cactuses (or cacti, if you want to get all Latin on me), but the labeling is changed to say that it is an ocean with waves. The idea is that someone asks the LLM for ocean images, the amended image will show up. But because it is clearly a desert scene, it will be rejected.

Glaze works more directly on the image, in essence cloudying it to make it less desirable. Either way, the goal is to make it less likely that the protected image is used via LLM.

This technique, although imaginative, is unlikely to work for long. It will not be long before LLMs will be taught how to see through these defensive techniques.

“To protect your works, you have to degrade your work,” said George Chedzhemov, the cybersecurity strategist at data firm BigID. “I am going to place a bet that companies with billions of dollars systems and workloads, that they are more likely to prevail in this cat-and-mouse game. In the long run, I simply don’t think this is going to be effective.”

The offensive technique is potentially the more worrisome, but it is also highly unlikely to be effective, even in the short term.

The offensive technique works in one of two ways. One, it tries to target a specific company by making educated guesses about the kind of sites and material they would want to train their LLMs with. The attackers then target, not that specific company, but the many places where it is likely to go for training. If the target is, let’s say Nike or Adidas, the attackers might try and poison the databases at various university sports departments with high-profile sports teams. If the target were Citi or Chase, the bad guys might target databases at key Federal Reserve sites.

The problem is that both ends of that attack plan could easily be thwarted. The university sites might detect and block the manipulation efforts. To make the attack work, the inserted data would likely have to include malware executables, which are relatively easy to detect.

Even if the bad actors’ goal was to simply feed incorrect data into the target systems — which would, in theory, make their analysis flawed — most LLM training absorbs such a massively large number of datasets that the attack is unlikely to work well.

“The planted code would end up being extremely diluted. Only a tiny amount of the malicious code would likely survive,” Chedzhemov said.

The other malicious AI data poisoning tactic amounts to a spray-and-pray mechanism. Instead of targeting a specific company, the bad actors would try and contaminate a massive number of sites and hope the malware somehow ends up at a company with attractive data to steal.

“They would need to contaminate tens of thousands of sites all over the place,” Chedzhemov said. “And then they need to hope that LLM model somehow hones in on one of them.”

Chedzhemov argued that the only viable approach would be to “pick an extremely esoteric area for which there is not a lot of stuff out there, something very niche.”

The tech industry is quite familiar with these counter-measures and they rarely work for long, if ever. Consider antivirus programs that published definitions and then the bad guys changed the technique. Then the AV players looked for patterns instead of specific definitions, and so on. Or think of search engine spiders and their battles with robot.txt scripts that told them to go away. Or Youtube versus ad blockers.

LLM data poisoning is something that IT needs to be aware of and to guard against. But in this contest, I think IT has almost all of the advantages. How refreshingly rare.