Types of LLM vulnerabilities

This page documents categories of potential LLM vulnerabilities and failure modes.

Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.

LLM vulnerability types

Vulnerability Types

Security Vulnerabilities

Name	Description	Plugin ID
ASCII Smuggling	Attempts to obfuscate malicious content using ASCII smuggling	`ascii-smuggling`
BeaverTails	Uses the BeaverTails prompt injection dataset	`beavertails`
CCA	Simulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history.	`cca`
Cross-Session Leak	Checks for information sharing between unrelated sessions	`cross-session-leak`
CyberSecEval	Tests prompt injection attacks using the CyberSecEval dataset	`cyberseceval`
Debug Access	Attempts to access or use debugging commands	`debug-access`
Divergent Repetition	Tests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.	`divergent-repetition`
DoNotAnswer	Tests how well LLMs handle harmful queries using the DoNotAnswer dataset	`donotanswer`
Harmbench	Tests prompt injection attacks using the Harmbench dataset	`harmbench`
Hijacking	Unauthorized or off-topic resource use	`hijacking`
Indirect Prompt Injection	Tests if the prompt is vulnerable to instructions injected into variables in the prompt	`indirect-prompt-injection`
Malicious Code	Tests creation of malicious code	`harmful:cybercrime:malicious-code`
Malicious Resource Fetching	Server-Side Request Forgery (SSRF) tests	`ssrf`
Memory Poisoning	Tests whether an agent is vulnerable to memory poisoning attacks	`agentic:memory-poisoning`
Model Context Protocol	Tests for vulnerabilities to Model Context Protocol (MCP) attacks	`mcp`
Pliny	Tests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4S	`pliny`
Privilege Escalation	Broken Function Level Authorization (BFLA) tests	`bfla`
Prompt Extraction	Attempts to get the model to reveal its system prompt	`prompt-extraction`
RAG Document Exfiltration	Content related to RAG Document Exfiltration	`rag-document-exfiltration`
RAG Poisoning	Tests resistance against poisoning attacks on RAG retrieval systems	`rag-poisoning`
RBAC Enforcement	Tests whether the model properly implements Role-Based Access Control (RBAC)	`rbac`
Reasoning DoS	Tests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models.	`reasoning-dos`
Shell Injection	Attempts to execute shell commands through the model	`shell-injection`
Special Token Injection	Tests for special token injection vulnerabilities using conversation format delimiters	`special-token-injection`
SQL Injection	Attempts to perform SQL injection attacks to manipulate database queries	`sql-injection`
System Prompt Override	Tests if an AI system can be manipulated to ignore or override its original system prompt	`system-prompt-override`
Tool Discovery	Tests if an AI system reveals the list of tools, functions, or API calls it has access to	`tool-discovery`
Unauthorized Data Access	Broken Object Level Authorization (BOLA) tests	`bola`
XSTest	Tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations	`xstest`

Privacy Vulnerabilities

Name	Description	Plugin ID
COPPA	Tests compliance with children's privacy regulations (COPPA)	`coppa`
Direct PII Exposure	Direct exposure of PII	`pii:direct`
FERPA	Tests compliance with student educational privacy regulations (FERPA)	`ferpa`
PII in API/Database	PII exposed through API or database	`pii:api-db`
PII in Session Data	PII exposed in session data	`pii:session`
PII via Social Engineering	PII exposed through social engineering	`pii:social`
Privacy Violation	Content violating privacy rights	`harmful:privacy`

Criminal Activity

Name	Description	Plugin ID
Chemical & Biological Weapons	Content related to chemical or biological weapons	`harmful:chemical-biological-weapons`
Child Exploitation	Content exploiting or harming children	`harmful:child-exploitation`
Copyright Violations	Content violating copyright laws	`harmful:copyright-violations`
Cybercrime	Content related to cybercriminal activities	`harmful:cybercrime`
Illegal Activities	Content promoting illegal activities	`harmful:illegal-activities`
Illegal Drugs	Content related to illegal drug use or trade	`harmful:illegal-drugs`
Indiscriminate Weapons	Content related to weapons without context	`harmful:indiscriminate-weapons`
Intellectual Property Violation	Content violating intellectual property rights	`harmful:intellectual-property`
Non-Violent Crime	Content related to non-violent criminal activities	`harmful:non-violent-crime`
Sex Crimes	Content related to sexual crimes	`harmful:sex-crime`
Violent Crime	Content related to violent criminal activities	`harmful:violent-crime`

Harmful Activity

Name	Description	Plugin ID
Aegis	Evaluates model safety responses using the NVIDIA Aegis dataset	`aegis`
Graphic Content	Displaying graphic or violent content	`harmful:graphic-content`
Harassment and Bullying	Content that harasses or bullies individuals	`harmful:harassment-bullying`
Hate Speech	Content that promotes hate or discrimination	`harmful:hate`
IED	Tests handling of Improvised Explosive Devices	`harmful:weapons:ied`
Insults	Content that insults or demeans individuals	`harmful:insults`
Methamphetamine	Tests handling of methamphetamine-related content	`harmful:illegal-drugs:meth`
Profanity	Content containing profane or inappropriate language	`harmful:profanity`
Radicalization	Content that promotes radical or extremist views	`harmful:radicalization`
Self Harm	Content that encourages self-harm or suicide	`harmful:self-harm`
Sexual Content	Explicit or inappropriate sexual content	`harmful:sexual-content`
ToxicChat	Tests handling of toxic user prompts from the ToxicChat dataset	`toxic-chat`
UnsafeBench	Tests handling of unsafe image content through multi-modal model evaluation	`unsafebench`
VLGuard	Tests handling of potentially unsafe image content using the VLGuard dataset	`vlguard`
Wordplay	Tests whether AI systems can be tricked into generating profanity through innocent-seeming wordplay	`wordplay`

Misinformation and Misuse

Name	Description	Plugin ID
Competitor Endorsement	Competitor mentions and endorsements	`competitors`
Excessive Agency	Model taking excessive initiative or misunderstanding its capabilities	`excessive-agency`
Goal Misalignment	Tests whether AI systems recognize when optimizing measurable proxy metrics might not align with true underlying objectives (Goodhart's Law)	`goal-misalignment`
Hallucination	Model generating false or misleading information	`hallucination`
Imitation	Imitates people, brands, or organizations	`imitation`
Misinformation and Disinformation	Spreading false or misleading information	`harmful:misinformation-disinformation`
Off-Topic Manipulation	Tests whether AI systems can be manipulated to go off-topic from their intended purpose	`off-topic`
Overreliance	Model susceptible to relying on an incorrect user assumption or input	`overreliance`
Political Opinions	Makes political statements	`politics`
Religious Sensitivity	Tests handling of religious content and bias	`religion`
Specialized Advice	Providing advice in specialized fields without expertise	`harmful:specialized-advice`
Unsafe Practices	Content promoting unsafe or harmful practices	`harmful:unsafe-practices`
Unsupervised Contracts	Enters business or legal commitments without supervision	`contracts`
Unverifiable Claims	Tests whether AI systems make claims that cannot be verified even in principle	`unverifiable-claims`

Bias

Name	Description	Plugin ID
Age Bias	Tests handling of age bias, stereotypes, and discrimination	`bias:age`
Disability Bias	Tests handling of disability bias, stereotypes, and discrimination	`bias:disability`
Gender Bias	Tests handling of gender bias, stereotypes, and discrimination	`bias:gender`
Race Bias	Tests handling of racial bias, stereotypes, and discrimination	`bias:race`

Ecommerce

Name	Description	Plugin ID
E-commerce Compliance Bypass	Tests for bypassing age, geographic, and regulatory restrictions on e-commerce products	`ecommerce:compliance-bypass`
E-commerce Order Fraud	Tests for order fraud, transaction manipulation, and return/refund policy abuse	`ecommerce:order-fraud`
E-commerce PCI DSS	Tests for PCI DSS violations and payment card data exposure	`ecommerce:pci-dss`
E-commerce Price Manipulation	Tests for unauthorized pricing changes, discount abuse, and promotional policy violations	`ecommerce:price-manipulation`

Financial

Name	Description	Plugin ID
Financial Calculation Error	Tests for errors in financial calculations, risk assessments, or quantitative analysis	`financial:calculation-error`
Financial Compliance Violation	Tests for suggestions that violate securities laws, enable insider trading, or circumvent regulations	`financial:compliance-violation`
Financial Confidential Disclosure	Tests for inappropriate disclosure of MNPI, proprietary strategies, or confidential financial data	`financial:confidential-disclosure`
Financial Counterfactual Narrative	Tests for false financial narratives or misleading market information	`financial:counterfactual`
Financial Data Leakage	Tests for exposure of proprietary trading strategies or confidential financial data	`financial:data-leakage`
Financial Defamation	Tests for false statements damaging financial entity reputations	`financial:defamation`
Financial Hallucination	Tests for fabricated market data, non-existent financial instruments, or fictional company information	`financial:hallucination`
Financial Services Impartiality	Tests for unauthorized financial advice or transaction facilitation	`financial:impartiality`
Financial Services Misconduct	Tests for facilitation of financial crimes or market manipulation	`financial:misconduct`
Financial Sycophancy	Tests for agreeing with risky investment strategies or validating get-rich-quick schemes	`financial:sycophancy`

Medical

Name	Description	Plugin ID
Medical Anchoring Bias	Tests for medical anchoring bias where AI fixates on irrelevant information	`medical:anchoring-bias`
Medical Hallucination	Tests for fabricated medical facts, studies, or drug interactions	`medical:hallucination`
Medical Incorrect Knowledge	Tests for factually wrong medical information that could cause harm	`medical:incorrect-knowledge`
Medical Off-Label Use	Tests for inappropriate off-label medication recommendations without proper disclaimers	`medical:off-label-use`
Medical Prioritization Error	Tests for poor medical prioritization and triage decisions	`medical:prioritization-error`
Medical Sycophancy	Tests for agreeing with incorrect medical assumptions to be helpful	`medical:sycophancy`

Pharmacy

Name	Description	Plugin ID
Controlled Substance Compliance	Tests adherence to DEA regulations for controlled substance prescribing, dispensing, and monitoring	`pharmacy:controlled-substance-compliance`
Dosage Calculation	Tests accuracy of pharmaceutical dosage calculations including weight-based dosing and renal adjustments	`pharmacy:dosage-calculation`
Drug Interaction Detection	Tests identification of dangerous drug-drug, drug-food, and drug-supplement interactions	`pharmacy:drug-interaction`

Insurance

Name	Description	Plugin ID
Coverage Discrimination	Tests for discriminatory coverage decisions based on protected characteristics violating ADA, Section 1557, and GINA	`insurance:coverage-discrimination`
Network Misinformation	Tests accuracy of provider network information to prevent surprise medical bills and balance billing	`insurance:network-misinformation`
PHI Disclosure	Tests whether AI systems properly protect Protected Health Information (PHI) and comply with HIPAA privacy requirements	`insurance:phi-disclosure`

Custom

Name	Description	Plugin ID
Custom Prompts	Probes the model with specific inputs	`intent`
Custom Topic	Violates a custom configured policy	`policy`

Vulnerabilities by Application

Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.

Security Vulnerabilities ▼

Technical security risks in application context.

Plugin Name	Description	Plugin ID	RAG	Agent	Chatbot
Memory Poisoning	Tests whether an agent is vulnerable to memory poisoning attacks	`agentic:memory-poisoning`	✅	🚨	✅
ASCII Smuggling	Attempts to obfuscate malicious content using ASCII smuggling	`ascii-smuggling`	🚨	🚨	🚨
BeaverTails	Uses the BeaverTails prompt injection dataset	`beavertails`	🚨	🚨	🚨
Privilege Escalation	Broken Function Level Authorization (BFLA) tests	`bfla`	🚨	🚨	✅
Unauthorized Data Access	Broken Object Level Authorization (BOLA) tests	`bola`	🚨	🚨	✅
CCA	Simulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history.	`cca`	🚨	🚨	✅
Cross-Session Leak	Checks for information sharing between unrelated sessions	`cross-session-leak`	🚨	🚨	🚨
CyberSecEval	Tests prompt injection attacks using the CyberSecEval dataset	`cyberseceval`	🚨	🚨	🚨
Harmbench	Tests prompt injection attacks using the Harmbench dataset	`harmbench`	🚨	🚨	🚨
Debug Access	Attempts to access or use debugging commands	`debug-access`	🚨	🚨	✅
Divergent Repetition	Tests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.	`divergent-repetition`	🚨	🚨	🚨
DoNotAnswer	Tests how well LLMs handle harmful queries using the DoNotAnswer dataset	`donotanswer`	🚨	🚨	🚨
Malicious Code	Tests creation of malicious code	`harmful:cybercrime:malicious-code`	🚨	🚨	🚨
Hijacking	Unauthorized or off-topic resource use	`hijacking`	🚨	🚨	🚨
Indirect Prompt Injection	Tests if the prompt is vulnerable to instructions injected into variables in the prompt	`indirect-prompt-injection`	🚨	🚨	✅
Model Context Protocol	Tests for vulnerabilities to Model Context Protocol (MCP) attacks	`mcp`	🚨	🚨	🚨
Pliny	Tests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4S	`pliny`	🚨	🚨	🚨
Prompt Extraction	Attempts to get the model to reveal its system prompt	`prompt-extraction`	🚨	🚨	🚨
RAG Document Exfiltration	Content related to RAG Document Exfiltration	`rag-document-exfiltration`	🚨	🚨	🚨
RAG Poisoning	Tests resistance against poisoning attacks on RAG retrieval systems	`rag-poisoning`	🚨	🚨	🚨
RBAC Enforcement	Tests whether the model properly implements Role-Based Access Control (RBAC)	`rbac`	🚨	🚨	✅
Reasoning DoS	Tests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models.	`reasoning-dos`	🚨	🚨	🚨
Shell Injection	Attempts to execute shell commands through the model	`shell-injection`	🚨	🚨	✅
Special Token Injection	Tests for special token injection vulnerabilities using conversation format delimiters	`special-token-injection`	🚨	🚨	🚨
SQL Injection	Attempts to perform SQL injection attacks to manipulate database queries	`sql-injection`	🚨	🚨	✅
Malicious Resource Fetching	Server-Side Request Forgery (SSRF) tests	`ssrf`	🚨	🚨	✅
System Prompt Override	Tests if an AI system can be manipulated to ignore or override its original system prompt	`system-prompt-override`	🚨	🚨	✅
Tool Discovery	Tests if an AI system reveals the list of tools, functions, or API calls it has access to	`tool-discovery`	🚨	🚨	🚨
XSTest	Tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations	`xstest`	🚨	🚨	🚨

Privacy Violations ▼

Privacy risks in application context.

Plugin Name	Description	Plugin ID	RAG	Agent	Chatbot
COPPA	Tests compliance with children's privacy regulations (COPPA)	`coppa`	🚨	🚨	🚨
FERPA	Tests compliance with student educational privacy regulations (FERPA)	`ferpa`	🚨	🚨	🚨
Privacy Violation	Content violating privacy rights	`harmful:privacy`	🚨	🚨	🚨
PII in API/Database	PII exposed through API or database	`pii:api-db`	🚨	🚨	✅
Direct PII Exposure	Direct exposure of PII	`pii:direct`	🚨	🚨	✅
PII in Session Data	PII exposed in session data	`pii:session`	🚨	🚨	✅
PII via Social Engineering	PII exposed through social engineering	`pii:social`	🚨	🚨	✅

Criminal Activities ▼

Criminal risks in application context.

Plugin Name	Description	Plugin ID	RAG	Agent	Chatbot
Chemical & Biological Weapons	Content related to chemical or biological weapons	`harmful:chemical-biological-weapons`	🚨	🚨	🚨
Child Exploitation	Content exploiting or harming children	`harmful:child-exploitation`	🚨	🚨	🚨
Copyright Violations	Content violating copyright laws	`harmful:copyright-violations`	🚨	🚨	🚨
Cybercrime	Content related to cybercriminal activities	`harmful:cybercrime`	🚨	🚨	🚨
Illegal Activities	Content promoting illegal activities	`harmful:illegal-activities`	🚨	🚨	🚨
Illegal Drugs	Content related to illegal drug use or trade	`harmful:illegal-drugs`	🚨	🚨	🚨
Indiscriminate Weapons	Content related to weapons without context	`harmful:indiscriminate-weapons`	🚨	🚨	🚨
Intellectual Property Violation	Content violating intellectual property rights	`harmful:intellectual-property`	🚨	🚨	🚨
Non-Violent Crime	Content related to non-violent criminal activities	`harmful:non-violent-crime`	🚨	🚨	🚨
Sex Crimes	Content related to sexual crimes	`harmful:sex-crime`	🚨	🚨	🚨
Violent Crime	Content related to violent criminal activities	`harmful:violent-crime`	🚨	🚨	🚨

Harmful Content ▼

Harmful content risks in application context.

Plugin Name	Description	Plugin ID	RAG	Agent	Chatbot
Aegis	Evaluates model safety responses using the NVIDIA Aegis dataset	`aegis`	🚨	🚨	🚨
Graphic Content	Displaying graphic or violent content	`harmful:graphic-content`	🚨	🚨	🚨
Harassment and Bullying	Content that harasses or bullies individuals	`harmful:harassment-bullying`	🚨	🚨	🚨
Hate Speech	Content that promotes hate or discrimination	`harmful:hate`	🚨	🚨	🚨
Methamphetamine	Tests handling of methamphetamine-related content	`harmful:illegal-drugs:meth`	🚨	🚨	🚨
Insults	Content that insults or demeans individuals	`harmful:insults`	🚨	🚨	🚨
Profanity	Content containing profane or inappropriate language	`harmful:profanity`	🚨	🚨	🚨
Radicalization	Content that promotes radical or extremist views	`harmful:radicalization`	🚨	🚨	🚨
Self Harm	Content that encourages self-harm or suicide	`harmful:self-harm`	🚨	🚨	🚨
Sexual Content	Explicit or inappropriate sexual content	`harmful:sexual-content`	🚨	🚨	🚨
IED	Tests handling of Improvised Explosive Devices	`harmful:weapons:ied`	🚨	🚨	🚨
ToxicChat	Tests handling of toxic user prompts from the ToxicChat dataset	`toxic-chat`	🚨	🚨	🚨
UnsafeBench	Tests handling of unsafe image content through multi-modal model evaluation	`unsafebench`	🚨	🚨	🚨
VLGuard	Tests handling of potentially unsafe image content using the VLGuard dataset	`vlguard`	🚨	🚨	🚨
Wordplay	Tests whether AI systems can be tricked into generating profanity through innocent-seeming wordplay	`wordplay`	🚨	🚨	🚨

Misinformation and Misuse ▼

Misinformation risks in application context.

Plugin Name	Description	Plugin ID	RAG	Agent	Chatbot
Competitor Endorsement	Competitor mentions and endorsements	`competitors`	🚨	🚨	🚨
Unsupervised Contracts	Enters business or legal commitments without supervision	`contracts`	🚨	🚨	🚨
Excessive Agency	Model taking excessive initiative or misunderstanding its capabilities	`excessive-agency`	🚨	🚨	🚨
Goal Misalignment	Tests whether AI systems recognize when optimizing measurable proxy metrics might not align with true underlying objectives (Goodhart's Law)	`goal-misalignment`	🚨	🚨	🚨
Hallucination	Model generating false or misleading information	`hallucination`	🚨	🚨	🚨
Misinformation and Disinformation	Spreading false or misleading information	`harmful:misinformation-disinformation`	🚨	🚨	🚨
Specialized Advice	Providing advice in specialized fields without expertise	`harmful:specialized-advice`	🚨	🚨	🚨
Unsafe Practices	Content promoting unsafe or harmful practices	`harmful:unsafe-practices`	🚨	🚨	🚨
Imitation	Imitates people, brands, or organizations	`imitation`	🚨	🚨	🚨
Off-Topic Manipulation	Tests whether AI systems can be manipulated to go off-topic from their intended purpose	`off-topic`	🚨	🚨	🚨
Overreliance	Model susceptible to relying on an incorrect user assumption or input	`overreliance`	🚨	🚨	🚨
Political Opinions	Makes political statements	`politics`	🚨	🚨	🚨
Religious Sensitivity	Tests handling of religious content and bias	`religion`	🚨	🚨	🚨
Unverifiable Claims	Tests whether AI systems make claims that cannot be verified even in principle	`unverifiable-claims`	🚨	🚨	🚨

Plugin Reference

For a complete list of available plugins and their severity levels, see the Plugins Overview page.

Vulnerability Types​

Security Vulnerabilities​

Privacy Vulnerabilities​

Criminal Activity​

Harmful Activity​

Misinformation and Misuse​

Bias​

Ecommerce​

Financial​

Medical​

Pharmacy​

Insurance​

Custom​

Vulnerabilities by Application​