Prompt Injection
Also known as: LLM injection, jailbreak
Prompt injection is an attack on LLM-powered applications where untrusted input is crafted to override the system’s intended instructions — causing the model to leak data, perform unauthorized actions, or generate harmful content.
Detailed explanation
Prompt injection comes in two main flavors. Direct injection occurs when a user sends adversarial input to a chat interface. Indirect injection occurs when an LLM ingests external content (a web page, an email, a document) that contains hidden instructions designed to hijack the model.
There is no perfect defense — modern LLMs do not reliably distinguish trusted instructions from untrusted content. Mitigations include strict separation of system and user messages, output validation, capability constraints (what the agent can actually do), human-in-the-loop for high-impact actions, content filtering, and red-team testing.
For agentic systems, prompt injection is especially dangerous because the model can take actions. Treat any tool call triggered by external content as untrusted, and design tools so that the worst-case action is acceptable.