Generative AI’s Biggest Security Flaw Is Not Easy to Fix

It’s easy to trick the large language models powering chatbots like OpenAI’s ChatGPT and Google’s Bard. In one experiment in February, security researchers forced Microsoft’s Bing chatbot to behave like a scammer. Hidden instructions on a web page the researchers created told the chatbot to ask the person using it to hand over their bank account details. This kind of attack, where concealed information can make the AI system behave in unintended ways, is just the beginning.

Hundreds of examples of “indirect prompt injection” attacks have been created since then. This type of attack is now considered one of the most concerning ways that language models could be abused by hackers. As generative AI systems are put to work by big corporations and smaller startups, the cybersecurity industry is scrambling to raise awareness of the potential dangers. In doing so, they hope to keep data—both personal and corporate—safe from attack. Right now there isn’t one magic fix, but common security practices can reduce the risks.

“Indirect prompt injection is definitely a concern for us,” says Vijay Bolina, the chief information security officer at Google’s DeepMind artificial intelligence unit, who says Google has multiple projects ongoing to understand how AI can be attacked. In the past, Bolina says, prompt injection was considered “problematic,” but things have accelerated since people started connecting large language models (LLMs) to the internet and plug-ins, which can add new data to the systems. As more companies use LLMs, potentially feeding them more personal and corporate data, things are going to get messy. “We definitely think this is a risk, and it actually limits the potential uses of LLMs for us as an industry,” Bolina says.

Blog