Notion just released version 3.0, complete with AI agents. Because the system contains Simon Willson’s lethal trifecta, it’s vulnerable to data theft though prompt injection.
First, the trifecta:
The lethal trifecta of capabilities is:
- Access to your private data—one of the most common purposes of tools in the first place!
- Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM
- The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)
This is, of course, basically the point of AI agents. //
The fundamental problem is that the LLM can’t differentiate between authorized commands and untrusted data. So when it encounters that malicious pdf, it just executes the embedded commands. And since it has (1) access to private data, and (2) the ability to communicate externally, it can fulfill the attacker’s requests. I’ll repeat myself:
This kind of thing should make everybody stop and really think before deploying any AI agents. We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and by this I mean that it may encounter untrusted training data or input—is vulnerable to prompt injection. It’s an existential problem that, near as I can tell, most people developing these technologies are just pretending isn’t there.