AI data poisoning is an attack that happens before your AI is even deployed.

Data poisoning attacks don't target your AI system. They target the data it was trained on. And once the damage is done, it's almost impossible to detect from the outside — because from the outside, a poisoned model looks exactly like a healthy one.

The core problem

When an AI model is trained on corrupted data, the corruption becomes part of the model itself. You cannot scan outputs and find a "poisoned response." The bias, the wrong answers, the hidden backdoor behaviour — it's woven into the model's weights. The only reliable fix is retraining from scratch on clean data.

Who this actually affects

It would be tempting to file data poisoning under "problems for AI researchers and large tech companies." That would be a mistake. The threat is relevant to any business operating in the following situations — and that list is growing rapidly.

Fine-tuning AI on internal data. Many businesses are now customising general-purpose AI models using their own documents, emails, customer records, and operational data. If that internal data has been tampered with — or was collected from sources the business doesn't fully control — the fine-tuned model inherits the problem.

Using vendor models trained on scraped web data. The large foundation models underpinning most commercial AI products were trained on enormous quantities of internet text. Researchers have demonstrated that a motivated attacker can influence what a model learns by deliberately publishing poisoned content at scale — knowing it will be scraped into future training runs. You may be using a model affected by this right now without any way to know.

Building AI workflows using custom datasets. Businesses creating AI tools for specific purposes — document classification, customer sentiment analysis, fraud detection — train those tools on labelled datasets. If access to that labelling process wasn't properly controlled, the dataset itself may have been tampered with before training even began.

Worth knowing

Data poisoning is classified by the NCSC as one of the primary risks in the AI supply chain. Businesses that outsource model development or use third-party training datasets are particularly exposed — they often have no visibility into how that data was collected, validated, or stored before it reached them.

Two types of attack — and why one is harder to catch

Data poisoning isn't a single technique. It splits broadly into two categories, each with a different signature and a different set of consequences for your business.

Targeted backdoor attacks

In a backdoor attack, the attacker doesn't want to degrade the model generally — that would be too obvious. Instead, they train a hidden trigger into the model. The model behaves completely normally in all standard tests. But when a specific input — a particular phrase, an image pattern, a sequence of tokens — is included in a query, the model behaves in the way the attacker intended. It might approve a transaction it should reject, classify a document incorrectly, or generate a response that serves the attacker's interests.

The insidious part: because the model performs normally on every test that doesn't include the trigger, standard evaluation processes will pass it. The backdoor only activates in the real world, at the moment the attacker chooses to use it.

Untargeted poisoning

Untargeted poisoning is less surgical but potentially broader in impact. The attacker's goal is simply to degrade model quality, introduce bias, or cause systematic errors in a specific domain. This might look like a customer sentiment classifier that consistently misreads negative feedback as neutral, a hiring tool that has absorbed biased patterns from manipulated training data, or a financial AI whose risk assessments drift in ways that are individually subtle but consistently wrong in a direction that benefits someone.

Untargeted poisoning is often harder to attribute — the model just seems to underperform in certain areas, and it's not obvious whether that's a training data problem, a poisoning attack, or simply a limitation of the model architecture.

How a data poisoning attack unfolds
Clean training datasetLegitimate data collected for model training
Attacker injects malicious samplesBiased labels · backdoor triggers · false data points
Model trains on poisoned dataThe corruption is absorbed into model weights during training
Corrupted outputs — undetectable from outsideModel passes standard tests but behaves wrongly in targeted scenarios

What this looks like in the real world

Data poisoning risks can feel abstract until you map them onto the kinds of AI tools businesses are actually deploying. Here's what the consequences look like when these attacks succeed.

Medical AI giving wrong diagnoses. An AI system trained to assist with clinical decision-making, poisoned during a fine-tuning run on hospital records, begins systematically misclassifying a specific set of symptoms. Because the error rate in standard testing is within acceptable bounds, it passes deployment review. Patients are affected before the pattern is identified.

HR AI introducing systematic hiring bias. A recruitment tool trained on historical hiring data — itself a product of previous human biases — absorbs and amplifies those patterns. The model consistently down-ranks candidates from certain backgrounds. No individual decision looks obviously wrong. The statistical pattern only emerges after months of deployment data is reviewed. By then, the legal and reputational exposure is significant.

Financial AI making systematically wrong risk assessments. A credit risk or fraud detection model, fine-tuned on data that was subtly manipulated, begins approving transactions it should flag and flagging legitimate customers it should approve. The error pattern is too consistent to be random — but identifying the cause requires going back to the training data, which many businesses don't have the tooling to do.

"The model doesn't know it's compromised. It's doing exactly what its training data told it to do. That's the point."
Business professional reviewing data and AI model outputs

Most businesses using custom AI models have no visibility into the integrity of the data they were trained on.

The common thread across all of these scenarios is that the problem is invisible at the model output level. You need visibility into the training pipeline — and most businesses simply don't have it. The data was collected by someone, labelled by someone, stored somewhere, and handed to a training process. At each of those steps, there's a point of potential compromise that standard AI testing won't catch.

Detect, Assess, Defend

Addressing data poisoning risk requires looking upstream from where most AI security programmes begin. The interventions happen before training, during training, and in the ongoing monitoring of deployed models. None of them are optional if you're operating AI in a high-stakes context.

The data poisoning risk framework — Detect, Assess, Defend
Detect
Data provenance audit
Trace where every training sample came from
Anomalous behaviour testing
Probe model for trigger-specific anomalies
Benchmark drift monitoring
Track performance shifts over time post-deployment
Assess
Are you fine-tuning?
Custom training dramatically raises exposure
Third-party data sources
External datasets you don't control or audit
Output sensitivity
What decisions does this model feed into?
Defend
Data validation pipeline
Automated checks before data enters training
Access controls on training data
Restrict who can modify or contribute to datasets
Model output monitoring
Ongoing checks for systematic error patterns
Vendor security requirements
Contractual data integrity standards for third parties

The most effective single control is data provenance — knowing exactly where your training data came from, who had write access to it, and whether it was validated before use. Most organisations that fine-tune AI models on internal data have never formally answered these questions. They collected data, formatted it, and trained. That's a significant gap.

The supply chain problem nobody's talking about

Businesses are rightly focused on the security of their own systems. But AI introduces a new category of supply chain risk that most security programmes haven't caught up with: you are inheriting the security posture of whoever built the model you're using, whoever provided the training data they used, and whoever had access to that data at any point in its history.

When you use an off-the-shelf AI model from a vendor, you are implicitly trusting their data sourcing, their training process, and their testing methodology. When you fine-tune that model on your own data, you're stacking your own data pipeline risks on top of theirs. This isn't a reason to avoid AI — but it is a reason to ask questions that most businesses currently don't ask when evaluating AI vendors and building AI workflows.

The businesses that get ahead of this aren't the ones waiting for an incident. They're the ones building AI governance frameworks that treat data integrity as a security requirement, not an afterthought.

How BBS helps with this

  • AI Security Gap Assessment — We audit your training pipelines and data provenance for custom and fine-tuned models, identifying where data integrity controls are absent and where third-party data sources introduce unacceptable risk.
  • Vibe Code Security Review — If your team has used AI-generated code to build or manage training pipelines, we review that code for security vulnerabilities — including insecure data handling patterns that could create poisoning entry points.
  • AI Governance Framework — We establish data quality controls, access restrictions, and validation requirements for your training datasets — turning data integrity from an informal assumption into a documented, enforced standard.
  • Custom AI Consulting — For businesses building their own AI-powered tools, we provide architecture guidance on designing training workflows that are resistant to poisoning — including data sourcing strategy, labelling controls, and ongoing monitoring.