Types of LLM vulnerabilities
This page documents categories of potential LLM vulnerabilities and failure modes.
Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.
See also our specific guides on:
- Red teaming AI agents
- Red teaming RAGs
- Red teaming multi-modal models
- Testing and validating guardrails
Vulnerability Types
Security Vulnerabilities
| Name | Description | Plugin ID |
|---|---|---|
| ASCII Smuggling | Attempts to obfuscate malicious content using ASCII smuggling | ascii-smuggling |
| BeaverTails | Uses the BeaverTails prompt injection dataset | beavertails |
| CCA | Simulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history. | cca |
| Cross-Session Leak | Checks for information sharing between unrelated sessions | cross-session-leak |
| CyberSecEval | Tests prompt injection attacks using the CyberSecEval dataset | cyberseceval |
| Debug Access | Attempts to access or use debugging commands | debug-access |
| Divergent Repetition | Tests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation. | divergent-repetition |
| DoNotAnswer | Tests how well LLMs handle harmful queries using the DoNotAnswer dataset | donotanswer |
| Harmbench | Tests prompt injection attacks using the Harmbench dataset | harmbench |
| Hijacking | Unauthorized or off-topic resource use | hijacking |
| Indirect Prompt Injection | Tests if the prompt is vulnerable to instructions injected into variables in the prompt | indirect-prompt-injection |
| Malicious Code | Tests creation of malicious code | harmful:cybercrime:malicious-code |
| Malicious Resource Fetching | Server-Side Request Forgery (SSRF) tests | ssrf |
| Memory Poisoning | Tests whether an agent is vulnerable to memory poisoning attacks | agentic:memory-poisoning |
| Model Context Protocol | Tests for vulnerabilities to Model Context Protocol (MCP) attacks | mcp |
| Pliny | Tests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4S | pliny |
| Privilege Escalation | Broken Function Level Authorization (BFLA) tests | bfla |
| Prompt Extraction | Attempts to get the model to reveal its system prompt | prompt-extraction |
| RAG Document Exfiltration | Content related to RAG Document Exfiltration | rag-document-exfiltration |
| RAG Poisoning | Tests resistance against poisoning attacks on RAG retrieval systems | rag-poisoning |
| RBAC Enforcement | Tests whether the model properly implements Role-Based Access Control (RBAC) | rbac |
| Reasoning DoS | Tests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models. | reasoning-dos |
| Shell Injection | Attempts to execute shell commands through the model | shell-injection |
| Special Token Injection | Tests for special token injection vulnerabilities using conversation format delimiters | special-token-injection |
| SQL Injection | Attempts to perform SQL injection attacks to manipulate database queries | sql-injection |
| System Prompt Override | Tests if an AI system can be manipulated to ignore or override its original system prompt | system-prompt-override |
| Tool Discovery | Tests if an AI system reveals the list of tools, functions, or API calls it has access to | tool-discovery |
| Unauthorized Data Access | Broken Object Level Authorization (BOLA) tests | bola |
| XSTest | Tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations | xstest |
Privacy Vulnerabilities
| Name | Description | Plugin ID |
|---|---|---|
| COPPA | Tests compliance with children's privacy regulations (COPPA) | coppa |
| Direct PII Exposure | Direct exposure of PII | pii:direct |
| PII in API/Database | PII exposed through API or database | pii:api-db |
| PII in Session Data | PII exposed in session data | pii:session |
| PII via Social Engineering | PII exposed through social engineering | pii:social |
| Privacy Violation | Content violating privacy rights | harmful:privacy |
Criminal Activity
| Name | Description | Plugin ID |
|---|---|---|
| Chemical & Biological Weapons | Content related to chemical or biological weapons | harmful:chemical-biological-weapons |
| Child Exploitation | Content exploiting or harming children | harmful:child-exploitation |
| Copyright Violations | Content violating copyright laws | harmful:copyright-violations |
| Cybercrime | Content related to cybercriminal activities | harmful:cybercrime |
| Illegal Activities | Content promoting illegal activities | harmful:illegal-activities |
| Illegal Drugs | Content related to illegal drug use or trade | harmful:illegal-drugs |
| Indiscriminate Weapons | Content related to weapons without context | harmful:indiscriminate-weapons |
| Intellectual Property Violation | Content violating intellectual property rights | harmful:intellectual-property |
| Non-Violent Crime | Content related to non-violent criminal activities | harmful:non-violent-crime |
| Sex Crimes | Content related to sexual crimes | harmful:sex-crime |
| Violent Crime | Content related to violent criminal activities | harmful:violent-crime |
Harmful Activity
| Name | Description | Plugin ID |
|---|---|---|
| Aegis | Evaluates model safety responses using the NVIDIA Aegis dataset | aegis |
| Graphic Content | Displaying graphic or violent content | harmful:graphic-content |
| Harassment and Bullying | Content that harasses or bullies individuals | harmful:harassment-bullying |
| Hate Speech | Content that promotes hate or discrimination | harmful:hate |
| Insults | Content that insults or demeans individuals | harmful:insults |
| Profanity | Content containing profane or inappropriate language | harmful:profanity |
| Radicalization | Content that promotes radical or extremist views | harmful:radicalization |
| Self Harm | Content that encourages self-harm or suicide | harmful:self-harm |
| Sexual Content | Explicit or inappropriate sexual content | harmful:sexual-content |
| ToxicChat | Tests handling of toxic user prompts from the ToxicChat dataset | toxic-chat |
| UnsafeBench | Tests handling of unsafe image content through multi-modal model evaluation | unsafebench |
| VLGuard | Tests handling of potentially unsafe image content using the VLGuard dataset | vlguard |
| Wordplay | Tests whether AI systems can be tricked into generating profanity through innocent-seeming wordplay | wordplay |
Misinformation and Misuse
| Name | Description | Plugin ID |
|---|---|---|
| Competitor Endorsement | Competitor mentions and endorsements | competitors |
| Excessive Agency | Model taking excessive initiative or misunderstanding its capabilities | excessive-agency |
| Hallucination | Model generating false or misleading information | hallucination |
| Imitation | Imitates people, brands, or organizations | imitation |
| Misinformation and Disinformation | Spreading false or misleading information | harmful:misinformation-disinformation |
| Off-Topic Manipulation | Tests whether AI systems can be manipulated to go off-topic from their intended purpose | off-topic |
| Overreliance | Model susceptible to relying on an incorrect user assumption or input | overreliance |
| Political Opinions | Makes political statements | politics |
| Religious Sensitivity | Tests handling of religious content and bias | religion |
| Specialized Advice | Providing advice in specialized fields without expertise | harmful:specialized-advice |
| Unsafe Practices | Content promoting unsafe or harmful practices | harmful:unsafe-practices |
| Unsupervised Contracts | Enters business or legal commitments without supervision | contracts |
| Unverifiable Claims | Tests whether AI systems make claims that cannot be verified even in principle | unverifiable-claims |
Vulnerabilities by Application
Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.
Plugin Reference
For a complete list of available plugins and their severity levels, see the Plugins Overview page.