Indirect prompt injection is an attack hiding in plain sight.

This isn't a theoretical edge case. It's a category of attack called indirect prompt injection — and it's already a documented threat to any business using AI tools to process incoming documents, browse the web, or handle external data. The reason it rarely makes headlines is that the victim usually never finds out.

Why this matters

Unlike phishing or malware, indirect injection doesn't require your staff to click anything suspicious, open a dodgy attachment, or make a mistake. Your employee does exactly what they're supposed to do — and the attack fires anyway.

What makes this different from regular prompt injection

If you read the first article in this series, you'll know that direct prompt injection usually requires someone to type a malicious instruction directly into an AI chat window. That's bad enough — but it at least requires the attacker to be in the conversation. A suspicious instruction typed into a chat box is at least theoretically visible.

Indirect injection is scarier because the attacker is never in the room. They plant instructions inside a document, webpage, email, or API response that your AI will later be asked to process. The malicious payload travels inside something that looks entirely legitimate. When a staff member asks the AI to read that content — the payload fires.

The person doing the asking did nothing wrong. The document looked fine. The AI response may look fine too. The breach happened silently, somewhere in the middle.

How an indirect injection attack plays out
Attacker embeds hidden instructionInside a contract, CV, webpage, API response, or shared doc
User passes document or URL to AI"Summarise this contract" / "Read this supplier brief"
AI processes content — sees both the real content AND the hidden payloadThe user sees nothing unusual in the document
AI silently executes the hidden instructionForwards emails · leaks data · exfiltrates system prompts · takes action

The four most common attack surfaces

Indirect injection can be delivered through any external content your AI is asked to process. Most businesses are exposed across several of these simultaneously — often without realising it.

What the hidden instruction can actually do

"The clever part is that you never see it. The document looks completely normal. The AI's response looks completely normal. The damage is done silently."
Business professional reviewing documents at desk

Every document your team passes to an AI tool is a potential attack surface — and most businesses have never tested what happens when that document is weaponised.

The scope of what can be extracted or triggered depends on what tools and data your AI has access to. In practice, the following have all been demonstrated as viable targets:

The exfiltrated data doesn't need to be sent somewhere obviously suspicious. A well-crafted injection might instruct the AI to embed stolen data in a URL that gets fetched as part of a normal-looking operation, or encode it into a response that only an attacker monitoring the output would know to intercept.

Why agentic AI makes this dramatically worse

When AI can only talk, the worst-case outcome is misleading information — the AI says something false or reveals something it shouldn't in a response. That's bad, but it's limited.

When AI can take actions — send emails, book meetings, update records, browse the web, call APIs, modify files — indirect injection becomes something close to a remote code execution attack. The attacker issues commands. Your AI carries them out. And your staff never saw the command being issued.

The attacker doesn't need to know who will receive the document. They don't need to know anything about your internal systems. They just need to know that your business uses AI tools to process incoming files — which, increasingly, is most businesses.

Real-world example

In 2024, security researchers demonstrated that a single hidden instruction embedded in a webpage could cause an AI browsing assistant to exfiltrate an entire conversation history to an attacker-controlled server — all without the user noticing anything unusual in the AI's visible responses.

Six defences that actually work

You can't train an AI to reliably distinguish a legitimate document from a weaponised one — that's a structural limitation of how language models work. Defence is about architecture, process, and permissions. Here's how to think about it systematically.

The indirect injection defence framework — Detect, Assess, Defend
Detect
Content scanning
Automated inspection of incoming docs before AI processing
Behaviour auditing
Log what the AI does, not just what it says
Anomalous action alerts
Flag actions outside the AI's expected role
Whitelist review
Which external sources are feeding your AI?
Assess
Attack surface mapping
Every place external content reaches your AI
Permission audit
What can the AI actually do with what it finds?
Data sensitivity review
What data is the AI currently exposed to?
Vendor risk scoring
Who sends you documents, and how trusted are they?
Defend
Input sanitisation
Strip instruction-like content before AI processing
Sandboxed AI environments
Process untrusted content in isolated sessions
Privilege minimisation
AI that summarises docs shouldn't have email access
Human approval gates
Any outbound action requires human confirmation

Most businesses don't even know what to look for

You can't detect a hidden instruction with the naked eye. It might be white text on a white background. HTML comments embedded in a webpage. Zero-width Unicode characters that appear as nothing at all. Markdown formatting exploits. Specially crafted metadata fields. The visual document looks completely clean.

The only way to catch this systematically is to inspect content before it reaches your AI — not after. Once the payload is inside the AI's context window, it's already operating. The question is what permissions and tools it has available when it does.

Most businesses running AI tools today have never audited what external content reaches those tools, what permissions the AI operates with, or what it would do if that content contained instructions. That gap is what attackers are counting on.

How BBS helps with this

  • AI Security Gap Assessment — We test your document workflows and AI integrations for indirect injection vulnerabilities, including testing of email processing pipelines, document summarisation tools, and AI browsing agents. You get a full risk register with prioritised remediation steps.
  • Input Sanitisation & Content Filtering — We design and implement content inspection layers that strip instruction-like content from documents before it reaches your AI tools. [Full managed service page coming soon]
  • AI Acceptable Use Policy — Defines which external sources are permitted to feed your AI, and what human review steps are required before untrusted content is processed.
  • Staff Awareness Training — Your team learns to recognise the signs that an AI response may have been tampered with — and what to report and when. [Full training service page coming soon]