Types of LLM vulnerabilities
This page documents categories of potential LLM vulnerabilities and failure modes.
Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.
See also our specific guides on:
- Red teaming AI agents
- Red teaming RAGs
- Red teaming multi-modal models
- Testing and validating guardrails
When comparing red team results across different tools or papers, be aware that Attack Success Rate (ASR) depends heavily on attempt budget, prompt set composition, and judge choice. See Why ASR Isn't Comparable Across Jailbreak Papers for guidance on interpreting these metrics.
Vulnerability Types
Security Vulnerabilities
| Name | Description | Plugin ID |
|---|---|---|
| ASCII Smuggling | Attempts to obfuscate malicious content using ASCII smuggling | ascii-smuggling |
| BeaverTails | Uses the BeaverTails prompt injection dataset | beavertails |
| CCA | Simulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history. | cca |
| Cross-Session Leak | Checks for information sharing between unrelated sessions | cross-session-leak |
| CyberSecEval | Tests prompt injection attacks using the CyberSecEval dataset | cyberseceval |
| Debug Access | Attempts to access or use debugging commands | debug-access |
| Divergent Repetition | Tests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation. | divergent-repetition |
| DoNotAnswer | Tests how well LLMs handle harmful queries using the DoNotAnswer dataset | donotanswer |
| Harmbench | Tests prompt injection attacks using the Harmbench dataset | harmbench |
| Hijacking | Unauthorized or off-topic resource use | hijacking |
| Indirect Prompt Injection | Tests if the prompt is vulnerable to instructions injected into variables in the prompt | indirect-prompt-injection |
| Malicious Code | Tests creation of malicious code | harmful:cybercrime:malicious-code |
| Malicious Resource Fetching | Server-Side Request Forgery (SSRF) tests | ssrf |
| Memory Poisoning | Tests whether an agent is vulnerable to memory poisoning attacks | agentic:memory-poisoning |
| Model Context Protocol | Tests for vulnerabilities to Model Context Protocol (MCP) attacks | mcp |
| Pliny | Tests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4S | pliny |
| Privilege Escalation | Broken Function Level Authorization (BFLA) tests | bfla |
| Prompt Extraction | Attempts to get the model to reveal its system prompt | prompt-extraction |
| RAG Document Exfiltration | Content related to RAG Document Exfiltration | rag-document-exfiltration |
| RAG Poisoning | Tests resistance against poisoning attacks on RAG retrieval systems | rag-poisoning |
| RBAC Enforcement | Tests whether the model properly implements Role-Based Access Control (RBAC) | rbac |
| Reasoning DoS | Tests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models. | reasoning-dos |
| Shell Injection | Attempts to execute shell commands through the model | shell-injection |
| Special Token Injection | Tests for special token injection vulnerabilities using conversation format delimiters | special-token-injection |
| SQL Injection | Attempts to perform SQL injection attacks to manipulate database queries | sql-injection |
| System Prompt Override | Tests if an AI system can be manipulated to ignore or override its original system prompt | system-prompt-override |
| Tool Discovery | Tests if an AI system reveals the list of tools, functions, or API calls it has access to | tool-discovery |
| Unauthorized Data Access | Broken Object Level Authorization (BOLA) tests | bola |
| XSTest | Tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations | xstest |
Privacy Vulnerabilities
| Name | Description | Plugin ID |
|---|---|---|
| COPPA | Tests compliance with children's privacy regulations (COPPA) | coppa |
| Direct PII Exposure | Direct exposure of PII | pii:direct |
| FERPA | Tests compliance with student educational privacy regulations (FERPA) | ferpa |
| PII in API/Database | PII exposed through API or database | pii:api-db |
| PII in Session Data | PII exposed in session data | pii:session |
| PII via Social Engineering | PII exposed through social engineering | pii:social |
| Privacy Violation | Content violating privacy rights | harmful:privacy |
Criminal Activity
| Name | Description | Plugin ID |
|---|---|---|
| Chemical & Biological Weapons | Content related to chemical or biological weapons | harmful:chemical-biological-weapons |
| Child Exploitation | Content exploiting or harming children | harmful:child-exploitation |
| Copyright Violations | Content violating copyright laws | harmful:copyright-violations |
| Cybercrime | Content related to cybercriminal activities | harmful:cybercrime |
| Illegal Activities | Content promoting illegal activities | harmful:illegal-activities |
| Illegal Drugs | Content related to illegal drug use or trade | harmful:illegal-drugs |
| Indiscriminate Weapons | Content related to weapons without context | harmful:indiscriminate-weapons |
| Intellectual Property Violation | Content violating intellectual property rights | harmful:intellectual-property |
| Non-Violent Crime | Content related to non-violent criminal activities | harmful:non-violent-crime |
| Sex Crimes | Content related to sexual crimes | harmful:sex-crime |
| Violent Crime | Content related to violent criminal activities | harmful:violent-crime |
Harmful Activity
| Name | Description | Plugin ID |
|---|---|---|
| Aegis | Evaluates model safety responses using the NVIDIA Aegis dataset | aegis |
| Graphic Content | Displaying graphic or violent content | harmful:graphic-content |
| Harassment and Bullying | Content that harasses or bullies individuals | harmful:harassment-bullying |
| Hate Speech | Content that promotes hate or discrimination | harmful:hate |
| IED | Tests handling of Improvised Explosive Devices | harmful:weapons:ied |
| Insults | Content that insults or demeans individuals | harmful:insults |
| Methamphetamine | Tests handling of methamphetamine-related content | harmful:illegal-drugs:meth |
| Profanity | Content containing profane or inappropriate language | harmful:profanity |
| Radicalization | Content that promotes radical or extremist views | harmful:radicalization |
| Self Harm | Content that encourages self-harm or suicide | harmful:self-harm |
| Sexual Content | Explicit or inappropriate sexual content | harmful:sexual-content |
| ToxicChat | Tests handling of toxic user prompts from the ToxicChat dataset | toxic-chat |
| UnsafeBench | Tests handling of unsafe image content through multi-modal model evaluation | unsafebench |
| VLGuard | Tests handling of potentially unsafe image content using the VLGuard dataset | vlguard |
| Wordplay | Tests whether AI systems can be tricked into generating profanity through innocent-seeming wordplay | wordplay |
Misinformation and Misuse
| Name | Description | Plugin ID |
|---|---|---|
| Competitor Endorsement | Competitor mentions and endorsements | competitors |
| Excessive Agency | Model taking excessive initiative or misunderstanding its capabilities | excessive-agency |
| Goal Misalignment | Tests whether AI systems recognize when optimizing measurable proxy metrics might not align with true underlying objectives (Goodhart's Law) | goal-misalignment |
| Hallucination | Model generating false or misleading information | hallucination |
| Imitation | Imitates people, brands, or organizations | imitation |
| Misinformation and Disinformation | Spreading false or misleading information | harmful:misinformation-disinformation |
| Off-Topic Manipulation | Tests whether AI systems can be manipulated to go off-topic from their intended purpose | off-topic |
| Overreliance | Model susceptible to relying on an incorrect user assumption or input | overreliance |
| Political Opinions | Makes political statements | politics |
| Religious Sensitivity | Tests handling of religious content and bias | religion |
| Specialized Advice | Providing advice in specialized fields without expertise | harmful:specialized-advice |
| Unsafe Practices | Content promoting unsafe or harmful practices | harmful:unsafe-practices |
| Unsupervised Contracts | Enters business or legal commitments without supervision | contracts |
| Unverifiable Claims | Tests whether AI systems make claims that cannot be verified even in principle | unverifiable-claims |
Bias
| Name | Description | Plugin ID |
|---|---|---|
| Age Bias | Tests handling of age bias, stereotypes, and discrimination | bias:age |
| Disability Bias | Tests handling of disability bias, stereotypes, and discrimination | bias:disability |
| Gender Bias | Tests handling of gender bias, stereotypes, and discrimination | bias:gender |
| Race Bias | Tests handling of racial bias, stereotypes, and discrimination | bias:race |
Ecommerce
| Name | Description | Plugin ID |
|---|---|---|
| E-commerce Compliance Bypass | Tests for bypassing age, geographic, and regulatory restrictions on e-commerce products | ecommerce:compliance-bypass |
| E-commerce Order Fraud | Tests for order fraud, transaction manipulation, and return/refund policy abuse | ecommerce:order-fraud |
| E-commerce PCI DSS | Tests for PCI DSS violations and payment card data exposure | ecommerce:pci-dss |
| E-commerce Price Manipulation | Tests for unauthorized pricing changes, discount abuse, and promotional policy violations | ecommerce:price-manipulation |
Financial
| Name | Description | Plugin ID |
|---|---|---|
| Financial Calculation Error | Tests for errors in financial calculations, risk assessments, or quantitative analysis | financial:calculation-error |
| Financial Compliance Violation | Tests for suggestions that violate securities laws, enable insider trading, or circumvent regulations | financial:compliance-violation |
| Financial Confidential Disclosure | Tests for inappropriate disclosure of MNPI, proprietary strategies, or confidential financial data | financial:confidential-disclosure |
| Financial Counterfactual Narrative | Tests for false financial narratives or misleading market information | financial:counterfactual |
| Financial Data Leakage | Tests for exposure of proprietary trading strategies or confidential financial data | financial:data-leakage |
| Financial Defamation | Tests for false statements damaging financial entity reputations | financial:defamation |
| Financial Hallucination | Tests for fabricated market data, non-existent financial instruments, or fictional company information | financial:hallucination |
| Financial Services Impartiality | Tests for unauthorized financial advice or transaction facilitation | financial:impartiality |
| Financial Services Misconduct | Tests for facilitation of financial crimes or market manipulation | financial:misconduct |
| Financial Sycophancy | Tests for agreeing with risky investment strategies or validating get-rich-quick schemes | financial:sycophancy |
Medical
| Name | Description | Plugin ID |
|---|---|---|
| Medical Anchoring Bias | Tests for medical anchoring bias where AI fixates on irrelevant information | medical:anchoring-bias |
| Medical Hallucination | Tests for fabricated medical facts, studies, or drug interactions | medical:hallucination |
| Medical Incorrect Knowledge | Tests for factually wrong medical information that could cause harm | medical:incorrect-knowledge |
| Medical Off-Label Use | Tests for inappropriate off-label medication recommendations without proper disclaimers | medical:off-label-use |
| Medical Prioritization Error | Tests for poor medical prioritization and triage decisions | medical:prioritization-error |
| Medical Sycophancy | Tests for agreeing with incorrect medical assumptions to be helpful | medical:sycophancy |
Pharmacy
| Name | Description | Plugin ID |
|---|---|---|
| Controlled Substance Compliance | Tests adherence to DEA regulations for controlled substance prescribing, dispensing, and monitoring | pharmacy:controlled-substance-compliance |
| Dosage Calculation | Tests accuracy of pharmaceutical dosage calculations including weight-based dosing and renal adjustments | pharmacy:dosage-calculation |
| Drug Interaction Detection | Tests identification of dangerous drug-drug, drug-food, and drug-supplement interactions | pharmacy:drug-interaction |
Insurance
| Name | Description | Plugin ID |
|---|---|---|
| Coverage Discrimination | Tests for discriminatory coverage decisions based on protected characteristics violating ADA, Section 1557, and GINA | insurance:coverage-discrimination |
| Network Misinformation | Tests accuracy of provider network information to prevent surprise medical bills and balance billing | insurance:network-misinformation |
| PHI Disclosure | Tests whether AI systems properly protect Protected Health Information (PHI) and comply with HIPAA privacy requirements | insurance:phi-disclosure |
Custom
| Name | Description | Plugin ID |
|---|---|---|
| Custom Prompts | Probes the model with specific inputs | intent |
| Custom Topic | Violates a custom configured policy | policy |
Vulnerabilities by Application
Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.
Plugin Reference
For a complete list of available plugins and their severity levels, see the Plugins Overview page.