Types of LLM vulnerabilities
This page documents categories of potential LLM vulnerabilities and failure modes.
Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.
Vulnerability Types
Security Vulnerabilities
Name | Description | Plugin ID |
---|---|---|
ASCII Smuggling | Attempts to obfuscate malicious content using ASCII smuggling | ascii-smuggling |
BeaverTails | Uses the BeaverTails prompt injection dataset | beavertails |
Cross-Session Leak | Checks for information sharing between unrelated sessions | cross-session-leak |
CyberSecEval | Tests prompt injection attacks using the CyberSecEval dataset | cyberseceval |
Debug Access | Attempts to access or use debugging commands | debug-access |
Divergent Repetition | Tests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation. | divergent-repetition |
Hijacking | Unauthorized or off-topic resource use | hijacking |
Indirect Prompt Injection | Tests if the prompt is vulnerable to instructions injected into variables in the prompt | indirect-prompt-injection |
Malicious Code | Tests creation of malicious code | harmful:cybercrime:malicious-code |
Malicious Resource Fetching | Server-Side Request Forgery (SSRF) tests | ssrf |
Pliny | Tests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4S | pliny |
Privilege Escalation | Broken Function Level Authorization (BFLA) tests | bfla |
Prompt Extraction | Attempts to get the model to reveal its system prompt | prompt-extraction |
RBAC Enforcement | Tests whether the model properly implements Role-Based Access Control (RBAC) | rbac |
Shell Injection | Attempts to execute shell commands through the model | shell-injection |
SQL Injection | Attempts to perform SQL injection attacks to manipulate database queries | sql-injection |
System Prompt Override | Tests if an AI system can be manipulated to ignore or override its original system prompt | system-prompt-override |
Unauthorized Data Access | Broken Object Level Authorization (BOLA) tests | bola |
Privacy Vulnerabilities
Name | Description | Plugin ID |
---|---|---|
Direct PII Exposure | Direct exposure of PII | pii:direct |
PII in API/Database | PII exposed through API or database | pii:api-db |
PII in Session Data | PII exposed in session data | pii:session |
PII via Social Engineering | PII exposed through social engineering | pii:social |
Privacy Violation | Content violating privacy rights | harmful:privacy |
Criminal Activity
Name | Description | Plugin ID |
---|---|---|
Chemical & Biological Weapons | Content related to chemical or biological weapons | harmful:chemical-biological-weapons |
Child Exploitation | Content exploiting or harming children | harmful:child-exploitation |
Copyright Violations | Content violating copyright laws | harmful:copyright-violations |
Cybercrime | Content related to cybercriminal activities | harmful:cybercrime |
IEDs | Content related to creating Improvised Explosive Devices | harmful:weapons:ied |
Illegal Activities | Content promoting illegal activities | harmful:illegal-activities |
Illegal Drugs | Content related to illegal drug use or trade | harmful:illegal-drugs |
Indiscriminate Weapons | Content related to weapons without context | harmful:indiscriminate-weapons |
Intellectual Property Violation | Content violating intellectual property rights | harmful:intellectual-property |
Malicious Code | Content related to generating malicious code | harmful:cybercrime:malicious-code |
Methamphetamine | Content related to methamphetamine | harmful:illegal-drugs:meth |
Non-Violent Crime | Content related to non-violent criminal activities | harmful:non-violent-crime |
Sex Crimes | Content related to sexual crimes | harmful:sex-crime |
Violent Crime | Content related to violent criminal activities | harmful:violent-crime |
Harmful Activity
Name | Description | Plugin ID |
---|---|---|
Graphic Content | Displaying graphic or violent content | harmful:graphic-content |
Harassment and Bullying | Content that harasses or bullies individuals | harmful:harassment-bullying |
Hate Speech | Content that promotes hate or discrimination | harmful:hate |
Insults | Content that insults or demeans individuals | harmful:insults |
Profanity | Content containing profane or inappropriate language | harmful:profanity |
Radicalization | Content that promotes radical or extremist views | harmful:radicalization |
Self Harm | Content that encourages self-harm or suicide | harmful:self-harm |
Sexual Content | Explicit or inappropriate sexual content | harmful:sexual-content |
Misinformation and Misuse
Name | Description | Plugin ID |
---|---|---|
Competitor Endorsement | Competitor mentions and endorsements | competitors |
Excessive Agency | Model taking excessive initiative or misunderstanding its capabilities | excessive-agency |
Hallucination | Model generating false or misleading information | hallucination |
Imitation | Imitates people, brands, or organizations | imitation |
Misinformation and Disinformation | Spreading false or misleading information | harmful:misinformation-disinformation |
Overreliance | Model susceptible to relying on an incorrect user assumption or input | overreliance |
Political Opinions | Makes political statements | politics |
Religious Sensitivity | Tests handling of religious content and bias | religion |
Specialized Advice | Providing advice in specialized fields without expertise | harmful:specialized-advice |
Unsafe Practices | Content promoting unsafe or harmful practices | harmful:unsafe-practices |
Unsupervised Contracts | Enters business or legal commitments without supervision | contracts |
Vulnerabilities by Application
Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.
Plugin Reference
For a complete list of available plugins and their severity levels, see the Plugins Overview page.