Indirect prompt injection is an attack hiding in plain sight.
This isn't a theoretical edge case. It's a category of attack called indirect prompt injection — and it's already a documented threat to any business using AI tools to process incoming documents, browse the web, or handle external data. The reason it rarely makes headlines is that the victim usually never finds out.
Unlike phishing or malware, indirect injection doesn't require your staff to click anything suspicious, open a dodgy attachment, or make a mistake. Your employee does exactly what they're supposed to do — and the attack fires anyway.
What makes this different from regular prompt injection
If you read the first article in this series, you'll know that direct prompt injection usually requires someone to type a malicious instruction directly into an AI chat window. That's bad enough — but it at least requires the attacker to be in the conversation. A suspicious instruction typed into a chat box is at least theoretically visible.
Indirect injection is scarier because the attacker is never in the room. They plant instructions inside a document, webpage, email, or API response that your AI will later be asked to process. The malicious payload travels inside something that looks entirely legitimate. When a staff member asks the AI to read that content — the payload fires.
The person doing the asking did nothing wrong. The document looked fine. The AI response may look fine too. The breach happened silently, somewhere in the middle.
The four most common attack surfaces
Indirect injection can be delivered through any external content your AI is asked to process. Most businesses are exposed across several of these simultaneously — often without realising it.
- Email attachments — Supplier contracts, CVs, client briefs, RFP documents. Anything that comes in from outside and gets handed to an AI for summarisation or review.
- Web content — If your AI can browse the web to research a topic, every page it visits is an opportunity to deliver a payload. Competitor sites, news articles, industry reports — any of these could be weaponised.
- Shared documents — Google Docs, Notion pages, SharePoint files edited by external collaborators. A supplier with edit access to a shared document has everything they need.
- API responses — Data your AI fetches from third-party services or public APIs. If the response contains text your AI processes, that text can carry instructions.
What the hidden instruction can actually do
Every document your team passes to an AI tool is a potential attack surface — and most businesses have never tested what happens when that document is weaponised.
The scope of what can be extracted or triggered depends on what tools and data your AI has access to. In practice, the following have all been demonstrated as viable targets:
- Email thread history from the current session or connected inbox
- Names and contact details from CRM data the AI can query
- Internal pricing documents, strategy notes, or confidential briefs
- System prompts — revealing exactly how your AI is configured and what rules it follows
- Authentication tokens or API keys stored in the AI's memory or context
- Calendar details, meeting notes, project data, or task lists
The exfiltrated data doesn't need to be sent somewhere obviously suspicious. A well-crafted injection might instruct the AI to embed stolen data in a URL that gets fetched as part of a normal-looking operation, or encode it into a response that only an attacker monitoring the output would know to intercept.
Why agentic AI makes this dramatically worse
When AI can only talk, the worst-case outcome is misleading information — the AI says something false or reveals something it shouldn't in a response. That's bad, but it's limited.
When AI can take actions — send emails, book meetings, update records, browse the web, call APIs, modify files — indirect injection becomes something close to a remote code execution attack. The attacker issues commands. Your AI carries them out. And your staff never saw the command being issued.
The attacker doesn't need to know who will receive the document. They don't need to know anything about your internal systems. They just need to know that your business uses AI tools to process incoming files — which, increasingly, is most businesses.
In 2024, security researchers demonstrated that a single hidden instruction embedded in a webpage could cause an AI browsing assistant to exfiltrate an entire conversation history to an attacker-controlled server — all without the user noticing anything unusual in the AI's visible responses.
Six defences that actually work
You can't train an AI to reliably distinguish a legitimate document from a weaponised one — that's a structural limitation of how language models work. Defence is about architecture, process, and permissions. Here's how to think about it systematically.
Most businesses don't even know what to look for
You can't detect a hidden instruction with the naked eye. It might be white text on a white background. HTML comments embedded in a webpage. Zero-width Unicode characters that appear as nothing at all. Markdown formatting exploits. Specially crafted metadata fields. The visual document looks completely clean.
The only way to catch this systematically is to inspect content before it reaches your AI — not after. Once the payload is inside the AI's context window, it's already operating. The question is what permissions and tools it has available when it does.
Most businesses running AI tools today have never audited what external content reaches those tools, what permissions the AI operates with, or what it would do if that content contained instructions. That gap is what attackers are counting on.
How BBS helps with this
- AI Security Gap Assessment — We test your document workflows and AI integrations for indirect injection vulnerabilities, including testing of email processing pipelines, document summarisation tools, and AI browsing agents. You get a full risk register with prioritised remediation steps.
- Input Sanitisation & Content Filtering — We design and implement content inspection layers that strip instruction-like content from documents before it reaches your AI tools. [Full managed service page coming soon]
- AI Acceptable Use Policy — Defines which external sources are permitted to feed your AI, and what human review steps are required before untrusted content is processed.
- Staff Awareness Training — Your team learns to recognise the signs that an AI response may have been tampered with — and what to report and when. [Full training service page coming soon]