
It’s easy to trick the large language models powering chatbots like OpenAI’s ChatGPT and Google’s Bard. In one experiment in February, security researchers forced Microsoft’s Bing chatbot to behave like a scammer. Hidden instructions on a web page the researchers created told the chatbot to ask the person using it to hand over their bank account details. This kind of attack, where concealed information can make the AI system behave in unintended ways, is just the beginning.
Hundreds of examples of “indirect prompt injection” attacks have been created since then. This type of attack is now considered one of the most concerning ways that language models could be abused by hackers. As generative AI systems are put to work by big corporations and smaller startups
“Indirect prompt injection is definitely a concern for us,” says Vijay Bolina, the chief information security officer at Google’s DeepMind artificial intelligence unit, who says Google has multiple projects ongoing to understand how AI can be attacked. In the past, Bolina says, prompt injection was considered “problematic,” but things have accelerated since people started connecting large language models (LLMs) to the internet and plug-ins, which can add new data to the systems. As more companies use LLMs, potentially feeding them more personal and corporate data, things are going to get messy. “We definitely think this is a risk, and it actually limits the potential uses of LLMs for us as an industry,” Bolina says.
Prompt injection attacks fall into two categories—direct and indirect. And it’s the latter that’s causing most concern amongst security experts. When using a LLM, people ask questions or provide instructions in prompts that the system then answers. Direct prompt injections happen when someone tries to make the LLM answer in an unintended way—getting it to spout hate speech or harmful answers, for instance. Indirect prompt injections, the really concerning ones, take things up a notch. Instead of the user entering a malicious prompt, the instruction comes from a third party. A website the LLM can read, or a PDF that’s being analyzed, could, for example, contain hidden instructions for the AI system to follow.