Types of LLM vulnerabilities
This page documents categories of potential LLM vulnerabilities and failure modes.
Potential failures are bucketed as follows:
Each vulnerability type is supported by Promptfoo's open-source LLM red teaming tool, with the Plugin
column corresponding to the plugin ID in the tool. Learn more.
Privacy and Security
Category | Description | Plugin |
---|---|---|
PII Leaks | Exposure of personally identifiable information, potentially leading to identity theft or privacy breaches. | pii, pii:api-db, pii:direct, pii:session, pii:social |
Other Privacy Violations | Content that compromises individual privacy or data protection, potentially violating privacy laws and eroding user trust. | harmful:privacy |
Cybercrime and Hacking | Content related to unauthorized access, malware, or cyber attacks that could compromise digital security. | harmful:cybercrime |
RBAC (Role-Based Access Control) | Failures in implementing proper access control, potentially leading to unauthorized access to sensitive information. | rbac |
BFLA (Broken Function Level Authorization) | Failures in enforcing proper authorization for specific functions or actions, potentially allowing users to perform operations beyond their intended privileges (see OWASP API Security Top 10). | bfla |
BOLA (Broken Object Level Authorization) | Vulnerabilities that allow users to access or modify resources belonging to other users or outside their authorized scope, compromising data integrity and privacy (see OWASP API Security Top 10). | bola |
SSRF (Server-Side Request Forgery) | Vulnerability allowing an attacker to induce the server-side application to make HTTP requests to an arbitrary domain of the attacker's choosing, potentially leading to unauthorized access to internal resources or data exfiltration (see OWASP API Security Top 10). | ssrf |
Cross-Session Leak | The system inappropriately reveals information from one session to another, potentially exposing sensitive data across user boundaries. | cross-session-leak |
Debug Access | Vulnerability allowing unauthorized access to debugging commands or internal system information. | debug-access |
Technical Vulnerabilities
Category | Description | Plugin |
---|---|---|
Prompt Extraction | Vulnerability to malicious inputs designed to expose the system prompt and/or instructions. | prompt-extraction |
Hijacking | Unauthorized or off-topic resource use. | hijacking |
Prompt Injection | Vulnerability to malicious inputs designed to manipulate the model's behavior, potentially compromising system security. | prompt-injection |
Indirect Prompt Injection | Vulnerability where malicious content is injected into trusted data sources or variables used in the prompt, indirectly influencing the model's behavior and potentially compromising system security. | indirect-prompt-injection |
SQL Injection | Vulnerability to attacks attempting to execute unauthorized database queries, potentially compromising data integrity and security. | sql-injection |
Shell Injection | Susceptibility to attacks attempting to execute unauthorized shell commands, potentially compromising system security. | shell-injection |
ASCII Smuggling | A technique that uses special Unicode characters to embed invisible instructions in text | ascii-smuggling |
Imitation | Model impersonates individuals, brands, or organizations without authorization | imitation |
Intent Manipulation | Attempts to manipulate the model to exhibit specific behaviors or bypass safety measures | intent |
Criminal Activities
Category | Description | Plugin |
---|---|---|
Violent Crimes | Content related to physical harm, assault, or murder that could encourage or assist in real-world violence. | harmful:violent-crime |
Non-Violent Crimes | Information about fraud, theft, and other criminal activities that could facilitate illegal behavior. | harmful:non-violent-crime |
Sex Crimes | Content involving sexual assault, harassment, or non-consensual acts that could harm individuals or promote criminal behavior. | harmful:sex-crime |
Child Exploitation | Material that exploits or harms minors, posing severe ethical and legal risks. | harmful:child-exploitation |
Illegal Drugs | Content promoting or facilitating illegal drug use or trade, potentially contributing to substance abuse and criminal activities. | harmful:illegal-drugs |
Harmful Content
Category | Description | Plugin |
---|---|---|
Indiscriminate Weapons | Information about weapons of mass destruction or illegal armaments that could threaten public safety. | harmful:indiscriminate-weapons |
Chemical & Biological Weapons | Specific details about creating or using chemical or biological agents as weapons, posing significant public health risks. | harmful:chemical-biological-weapons |
Hate and Discrimination | Material promoting hatred, discrimination, or violence against specific groups, fostering a hostile environment and potentially violating anti-discrimination laws. | harmful:hate |
Self-Harm | Information encouraging or assisting in self-harm or suicide, potentially endangering vulnerable users. | harmful:self-harm |
Sexual Content | Explicit or inappropriate sexual material that may be unsuitable for certain audiences or violate content policies. | harmful:sexual-content |
Harassment and Bullying | Content targeting individuals with harmful or abusive behavior, creating an unsafe environment for users. | harmful:harassment-bullying |
Graphic Content | Disturbing or violent imagery or descriptions that may cause psychological distress to users. | harmful:graphic-content |
Unsafe Practices | Promotion of dangerous activities or health misinformation that could lead to physical harm or public health risks. | harmful:unsafe-practices |
Misinformation and Misuse
Category | Description | Plugin |
---|---|---|
Misinformation and Disinformation | Spread of false or misleading information that can manipulate public opinion or lead to harmful decision-making. | harmful:misinformation-disinformation |
Specialized Advice | Providing potentially harmful advice in specialized fields like finance or medicine without proper qualifications. | harmful:specialized-advice |
Copyright Violations | Material infringing on intellectual property rights, potentially leading to legal issues and undermining creative industries. | harmful:copyright-violations |
Competitors | Unintended mentions or endorsements of competing products or services that could undermine business interests. | competitors |
Excessive Agency | Model behavior exceeding its intended scope or capabilities, potentially leading to errors or inconsistent user experiences. | excessive-agency |
Hallucination | Generation of false or misleading information that undermines the reliability and trustworthiness of the system. | hallucination |
Overreliance | Model susceptibility to incorrect user input, potentially propagating errors or misinformation. | overreliance |
Policy Violations | Actions that contradict established organizational policies or guidelines | policy |
Political Statements | Generation of potentially controversial political content or biases | politics |
Religious Sensitivity | Generation of potentially controversial religious content or biases | religion |
Common LLM vulnerabilities by application type
The table below shows common vulnerabilities across different LLM application types. A 🚨 indicates that the vulnerability is typically applicable to that application type, while a ✅ means it's generally not a concern for that type of application.
Vulnerability | RAG | Agent | Chatbot |
---|---|---|---|
ASCII Smuggling | 🚨 | 🚨 | 🚨 |
Broken Function Level Authorization (BFLA) | 🚨 | 🚨 | ✅ |
Broken Object Level Authorization (BOLA) | 🚨 | 🚨 | ✅ |
Chemical & Biological Weapons | 🚨 | 🚨 | 🚨 |
Child Exploitation | 🚨 | 🚨 | 🚨 |
Competitors | 🚨 | 🚨 | 🚨 |
Context Window Overflow | 🚨 | 🚨 | 🚨 |
Copyright Violations | 🚨 | 🚨 | 🚨 |
Cybercrime and Hacking | 🚨 | 🚨 | 🚨 |
Data Exfiltration | 🚨 | 🚨 | ✅ |
Data Poisoning | 🚨 | ✅ | ✅ |
Debug Access | 🚨 | 🚨 | ✅ |
Excessive Agency | 🚨 | 🚨 | 🚨 |
Graphic Content | 🚨 | 🚨 | 🚨 |
Hallucination | 🚨 | 🚨 | 🚨 |
Harassment and Bullying | 🚨 | 🚨 | 🚨 |
Hate and Discrimination | 🚨 | 🚨 | 🚨 |
Hijacking | 🚨 | 🚨 | 🚨 |
Illegal Drugs | 🚨 | 🚨 | 🚨 |
Imitation | 🚨 | 🚨 | 🚨 |
Indiscriminate Weapons | 🚨 | 🚨 | 🚨 |
Intent Manipulation | 🚨 | 🚨 | 🚨 |
Misinformation and Disinformation | 🚨 | 🚨 | 🚨 |
Non-Violent Crimes | 🚨 | 🚨 | 🚨 |
Overreliance | 🚨 | 🚨 | 🚨 |
PII Exposure | 🚨 | 🚨 | ✅ |
Policy Violations | 🚨 | 🚨 | 🚨 |
Political Statements | 🚨 | 🚨 | 🚨 |
Prompt Injection | 🚨 | 🚨 | 🚨 |
Prompt Leak | 🚨 | 🚨 | 🚨 |
Religious Sensitivity | 🚨 | 🚨 | 🚨 |
Role-Based Access Control (RBAC) | 🚨 | 🚨 | ✅ |
SQL Injection | 🚨 | 🚨 | ✅ |
Self-Harm | 🚨 | 🚨 | 🚨 |
Server-Side Request Forgery (SSRF) | 🚨 | 🚨 |