Skip to main content

Types of LLM vulnerabilities

This page documents categories of potential LLM vulnerabilities and failure modes.

Potential failures are bucketed as follows:

Each vulnerability type is supported by Promptfoo's open-source LLM red teaming tool, with the Plugin column corresponding to the plugin ID in the tool. Learn more.

Privacy and Security

CategoryDescriptionPlugin
PII LeaksExposure of personally identifiable information, potentially leading to identity theft or privacy breaches.pii, pii:api-db, pii:direct, pii:session, pii:social
Other Privacy ViolationsContent that compromises individual privacy or data protection, potentially violating privacy laws and eroding user trust.harmful:privacy
Cybercrime and HackingContent related to unauthorized access, malware, or cyber attacks that could compromise digital security.harmful:cybercrime
RBAC (Role-Based Access Control)Failures in implementing proper access control, potentially leading to unauthorized access to sensitive information.rbac
BFLA (Broken Function Level Authorization)Failures in enforcing proper authorization for specific functions or actions, potentially allowing users to perform operations beyond their intended privileges (see OWASP API Security Top 10).bfla
BOLA (Broken Object Level Authorization)Vulnerabilities that allow users to access or modify resources belonging to other users or outside their authorized scope, compromising data integrity and privacy (see OWASP API Security Top 10).bola
SSRF (Server-Side Request Forgery)Vulnerability allowing an attacker to induce the server-side application to make HTTP requests to an arbitrary domain of the attacker's choosing, potentially leading to unauthorized access to internal resources or data exfiltration (see OWASP API Security Top 10).ssrf
Cross-Session LeakThe system inappropriately reveals information from one session to another, potentially exposing sensitive data across user boundaries.cross-session-leak
Debug AccessVulnerability allowing unauthorized access to debugging commands or internal system information.debug-access

Technical Vulnerabilities

CategoryDescriptionPlugin
Prompt ExtractionVulnerability to malicious inputs designed to expose the system prompt and/or instructions.prompt-extraction
HijackingUnauthorized or off-topic resource use.hijacking
Prompt InjectionVulnerability to malicious inputs designed to manipulate the model's behavior, potentially compromising system security.prompt-injection
Indirect Prompt InjectionVulnerability where malicious content is injected into trusted data sources or variables used in the prompt, indirectly influencing the model's behavior and potentially compromising system security.indirect-prompt-injection
SQL InjectionVulnerability to attacks attempting to execute unauthorized database queries, potentially compromising data integrity and security.sql-injection
Shell InjectionSusceptibility to attacks attempting to execute unauthorized shell commands, potentially compromising system security.shell-injection
ASCII SmugglingA technique that uses special Unicode characters to embed invisible instructions in textascii-smuggling
ImitationModel impersonates individuals, brands, or organizations without authorizationimitation
Intent ManipulationAttempts to manipulate the model to exhibit specific behaviors or bypass safety measuresintent

Criminal Activities

CategoryDescriptionPlugin
Violent CrimesContent related to physical harm, assault, or murder that could encourage or assist in real-world violence.harmful:violent-crime
Non-Violent CrimesInformation about fraud, theft, and other criminal activities that could facilitate illegal behavior.harmful:non-violent-crime
Sex CrimesContent involving sexual assault, harassment, or non-consensual acts that could harm individuals or promote criminal behavior.harmful:sex-crime
Child ExploitationMaterial that exploits or harms minors, posing severe ethical and legal risks.harmful:child-exploitation
Illegal DrugsContent promoting or facilitating illegal drug use or trade, potentially contributing to substance abuse and criminal activities.harmful:illegal-drugs

Harmful Content

CategoryDescriptionPlugin
Indiscriminate WeaponsInformation about weapons of mass destruction or illegal armaments that could threaten public safety.harmful:indiscriminate-weapons
Chemical & Biological WeaponsSpecific details about creating or using chemical or biological agents as weapons, posing significant public health risks.harmful:chemical-biological-weapons
Hate and DiscriminationMaterial promoting hatred, discrimination, or violence against specific groups, fostering a hostile environment and potentially violating anti-discrimination laws.harmful:hate
Self-HarmInformation encouraging or assisting in self-harm or suicide, potentially endangering vulnerable users.harmful:self-harm
Sexual ContentExplicit or inappropriate sexual material that may be unsuitable for certain audiences or violate content policies.harmful:sexual-content
Harassment and BullyingContent targeting individuals with harmful or abusive behavior, creating an unsafe environment for users.harmful:harassment-bullying
Graphic ContentDisturbing or violent imagery or descriptions that may cause psychological distress to users.harmful:graphic-content
Unsafe PracticesPromotion of dangerous activities or health misinformation that could lead to physical harm or public health risks.harmful:unsafe-practices

Misinformation and Misuse

CategoryDescriptionPlugin
Misinformation and DisinformationSpread of false or misleading information that can manipulate public opinion or lead to harmful decision-making.harmful:misinformation-disinformation
Specialized AdviceProviding potentially harmful advice in specialized fields like finance or medicine without proper qualifications.harmful:specialized-advice
Copyright ViolationsMaterial infringing on intellectual property rights, potentially leading to legal issues and undermining creative industries.harmful:copyright-violations
CompetitorsUnintended mentions or endorsements of competing products or services that could undermine business interests.competitors
Excessive AgencyModel behavior exceeding its intended scope or capabilities, potentially leading to errors or inconsistent user experiences.excessive-agency
HallucinationGeneration of false or misleading information that undermines the reliability and trustworthiness of the system.hallucination
OverrelianceModel susceptibility to incorrect user input, potentially propagating errors or misinformation.overreliance
Policy ViolationsActions that contradict established organizational policies or guidelinespolicy
Political StatementsGeneration of potentially controversial political content or biasespolitics
Religious SensitivityGeneration of potentially controversial religious content or biasesreligion

Common LLM vulnerabilities by application type

The table below shows common vulnerabilities across different LLM application types. A 🚨 indicates that the vulnerability is typically applicable to that application type, while a ✅ means it's generally not a concern for that type of application.

VulnerabilityRAGAgentChatbot
ASCII Smuggling🚨🚨🚨
Broken Function Level Authorization (BFLA)🚨🚨
Broken Object Level Authorization (BOLA)🚨🚨
Chemical & Biological Weapons🚨🚨🚨
Child Exploitation🚨🚨🚨
Competitors🚨🚨🚨
Context Window Overflow🚨🚨🚨
Copyright Violations🚨🚨🚨
Cybercrime and Hacking🚨🚨🚨
Data Exfiltration🚨🚨
Data Poisoning🚨
Debug Access🚨🚨
Excessive Agency🚨🚨🚨
Graphic Content🚨🚨🚨
Hallucination🚨🚨🚨
Harassment and Bullying🚨🚨🚨
Hate and Discrimination🚨🚨🚨
Hijacking🚨🚨🚨
Illegal Drugs🚨🚨🚨
Imitation🚨🚨🚨
Indiscriminate Weapons🚨🚨🚨
Intent Manipulation🚨🚨🚨
Misinformation and Disinformation🚨🚨🚨
Non-Violent Crimes🚨🚨🚨
Overreliance🚨🚨🚨
PII Exposure🚨🚨
Policy Violations🚨🚨🚨
Political Statements🚨🚨🚨
Prompt Injection🚨🚨🚨
Prompt Leak🚨🚨🚨
Religious Sensitivity🚨🚨🚨
Role-Based Access Control (RBAC)🚨🚨
SQL Injection🚨🚨
Self-Harm🚨🚨🚨
Server-Side Request Forgery (SSRF)🚨🚨
Sex Crimes🚨🚨🚨
Sexual Content🚨🚨🚨
Shell Injection🚨🚨
Specialized Advice🚨🚨🚨
Tool/API Manipulation🚨
Unauthorized Access🚨🚨
Unsafe Practices🚨🚨🚨
Violent Crimes🚨🚨🚨

Note that your mileage will vary based on your specific use case. Most applications are a combination of the above!