Skip to main content

Types of LLM vulnerabilities

This page documents categories of potential LLM vulnerabilities and failure modes.

Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.

LLM vulnerability types

Vulnerability Types

Security Vulnerabilities

NameDescriptionPlugin ID
ASCII SmugglingAttempts to obfuscate malicious content using ASCII smugglingascii-smuggling
BeaverTailsUses the BeaverTails prompt injection datasetbeavertails
Cross-Session LeakChecks for information sharing between unrelated sessionscross-session-leak
CyberSecEvalTests prompt injection attacks using the CyberSecEval datasetcyberseceval
Debug AccessAttempts to access or use debugging commandsdebug-access
Divergent RepetitionTests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.divergent-repetition
HijackingUnauthorized or off-topic resource usehijacking
Indirect Prompt InjectionTests if the prompt is vulnerable to instructions injected into variables in the promptindirect-prompt-injection
Malicious CodeTests creation of malicious codeharmful:cybercrime:malicious-code
Malicious Resource FetchingServer-Side Request Forgery (SSRF) testsssrf
PlinyTests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4Spliny
Privilege EscalationBroken Function Level Authorization (BFLA) testsbfla
Prompt ExtractionAttempts to get the model to reveal its system promptprompt-extraction
RBAC EnforcementTests whether the model properly implements Role-Based Access Control (RBAC)rbac
Shell InjectionAttempts to execute shell commands through the modelshell-injection
SQL InjectionAttempts to perform SQL injection attacks to manipulate database queriessql-injection
System Prompt OverrideTests if an AI system can be manipulated to ignore or override its original system promptsystem-prompt-override
Unauthorized Data AccessBroken Object Level Authorization (BOLA) testsbola

Privacy Vulnerabilities

NameDescriptionPlugin ID
Direct PII ExposureDirect exposure of PIIpii:direct
PII in API/DatabasePII exposed through API or databasepii:api-db
PII in Session DataPII exposed in session datapii:session
PII via Social EngineeringPII exposed through social engineeringpii:social
Privacy ViolationContent violating privacy rightsharmful:privacy

Criminal Activity

NameDescriptionPlugin ID
Chemical & Biological WeaponsContent related to chemical or biological weaponsharmful:chemical-biological-weapons
Child ExploitationContent exploiting or harming childrenharmful:child-exploitation
Copyright ViolationsContent violating copyright lawsharmful:copyright-violations
CybercrimeContent related to cybercriminal activitiesharmful:cybercrime
IEDsContent related to creating Improvised Explosive Devicesharmful:weapons:ied
Illegal ActivitiesContent promoting illegal activitiesharmful:illegal-activities
Illegal DrugsContent related to illegal drug use or tradeharmful:illegal-drugs
Indiscriminate WeaponsContent related to weapons without contextharmful:indiscriminate-weapons
Intellectual Property ViolationContent violating intellectual property rightsharmful:intellectual-property
Malicious CodeContent related to generating malicious codeharmful:cybercrime:malicious-code
MethamphetamineContent related to methamphetamineharmful:illegal-drugs:meth
Non-Violent CrimeContent related to non-violent criminal activitiesharmful:non-violent-crime
Sex CrimesContent related to sexual crimesharmful:sex-crime
Violent CrimeContent related to violent criminal activitiesharmful:violent-crime

Harmful Activity

NameDescriptionPlugin ID
Graphic ContentDisplaying graphic or violent contentharmful:graphic-content
Harassment and BullyingContent that harasses or bullies individualsharmful:harassment-bullying
Hate SpeechContent that promotes hate or discriminationharmful:hate
InsultsContent that insults or demeans individualsharmful:insults
ProfanityContent containing profane or inappropriate languageharmful:profanity
RadicalizationContent that promotes radical or extremist viewsharmful:radicalization
Self HarmContent that encourages self-harm or suicideharmful:self-harm
Sexual ContentExplicit or inappropriate sexual contentharmful:sexual-content

Misinformation and Misuse

NameDescriptionPlugin ID
Competitor EndorsementCompetitor mentions and endorsementscompetitors
Excessive AgencyModel taking excessive initiative or misunderstanding its capabilitiesexcessive-agency
HallucinationModel generating false or misleading informationhallucination
ImitationImitates people, brands, or organizationsimitation
Misinformation and DisinformationSpreading false or misleading informationharmful:misinformation-disinformation
OverrelianceModel susceptible to relying on an incorrect user assumption or inputoverreliance
Political OpinionsMakes political statementspolitics
Religious SensitivityTests handling of religious content and biasreligion
Specialized AdviceProviding advice in specialized fields without expertiseharmful:specialized-advice
Unsafe PracticesContent promoting unsafe or harmful practicesharmful:unsafe-practices
Unsupervised ContractsEnters business or legal commitments without supervisioncontracts

Vulnerabilities by Application

Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.

Security Vulnerabilities

Technical security risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
ASCII SmugglingAttempts to obfuscate malicious content using ASCII smugglingascii-smuggling🚨🚨🚨
BeaverTailsUses the BeaverTails prompt injection datasetbeavertails🚨🚨🚨
Malicious CodeTests creation of malicious codeharmful:cybercrime:malicious-code🚨🚨🚨
Privilege EscalationBroken Function Level Authorization (BFLA) testsbfla🚨🚨
Unauthorized Data AccessBroken Object Level Authorization (BOLA) testsbola🚨🚨
Cross-Session LeakChecks for information sharing between unrelated sessionscross-session-leak🚨🚨🚨
CyberSecEvalTests prompt injection attacks using the CyberSecEval datasetcyberseceval🚨🚨🚨
Debug AccessAttempts to access or use debugging commandsdebug-access🚨🚨
Divergent RepetitionTests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.divergent-repetition🚨🚨🚨
HijackingUnauthorized or off-topic resource usehijacking🚨🚨🚨
Indirect Prompt InjectionTests if the prompt is vulnerable to instructions injected into variables in the promptindirect-prompt-injection🚨🚨
PlinyTests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4Spliny🚨🚨🚨
Prompt ExtractionAttempts to get the model to reveal its system promptprompt-extraction🚨🚨🚨
RBAC EnforcementTests whether the model properly implements Role-Based Access Control (RBAC)rbac🚨🚨
Shell InjectionAttempts to execute shell commands through the modelshell-injection🚨🚨
SQL InjectionAttempts to perform SQL injection attacks to manipulate database queriessql-injection🚨🚨
Malicious Resource FetchingServer-Side Request Forgery (SSRF) testsssrf🚨🚨
System Prompt OverrideTests if an AI system can be manipulated to ignore or override its original system promptsystem-prompt-override🚨🚨

Privacy Violations

Privacy risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Privacy ViolationContent violating privacy rightsharmful:privacy🚨🚨🚨
PII in API/DatabasePII exposed through API or databasepii:api-db🚨🚨
Direct PII ExposureDirect exposure of PIIpii:direct🚨🚨
PII in Session DataPII exposed in session datapii:session🚨🚨
PII via Social EngineeringPII exposed through social engineeringpii:social🚨🚨

Criminal Activities

Criminal risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Chemical & Biological WeaponsContent related to chemical or biological weaponsharmful:chemical-biological-weapons🚨🚨🚨
Child ExploitationContent exploiting or harming childrenharmful:child-exploitation🚨🚨🚨
Copyright ViolationsContent violating copyright lawsharmful:copyright-violations🚨🚨🚨
CybercrimeContent related to cybercriminal activitiesharmful:cybercrime🚨🚨🚨
Illegal ActivitiesContent promoting illegal activitiesharmful:illegal-activities🚨🚨🚨
Illegal DrugsContent related to illegal drug use or tradeharmful:illegal-drugs🚨🚨🚨
MethamphetamineContent related to methamphetamineharmful:illegal-drugs:meth🚨🚨🚨
Indiscriminate WeaponsContent related to weapons without contextharmful:indiscriminate-weapons🚨🚨🚨
Intellectual Property ViolationContent violating intellectual property rightsharmful:intellectual-property🚨🚨🚨
Malicious CodeContent related to generating malicious codeharmful:cybercrime:malicious-code🚨🚨🚨
Non-Violent CrimeContent related to non-violent criminal activitiesharmful:non-violent-crime🚨🚨🚨
Sex CrimesContent related to sexual crimesharmful:sex-crime🚨🚨🚨
Violent CrimeContent related to violent criminal activitiesharmful:violent-crime🚨🚨🚨
IEDsContent related to creating Improvised Explosive Devicesharmful:weapons:ied🚨🚨🚨

Harmful Content

Harmful content risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Graphic ContentDisplaying graphic or violent contentharmful:graphic-content🚨🚨🚨
Harassment and BullyingContent that harasses or bullies individualsharmful:harassment-bullying🚨🚨🚨
Hate SpeechContent that promotes hate or discriminationharmful:hate🚨🚨🚨
InsultsContent that insults or demeans individualsharmful:insults🚨🚨🚨
ProfanityContent containing profane or inappropriate languageharmful:profanity🚨🚨🚨
RadicalizationContent that promotes radical or extremist viewsharmful:radicalization🚨🚨🚨
Self HarmContent that encourages self-harm or suicideharmful:self-harm🚨🚨🚨
Sexual ContentExplicit or inappropriate sexual contentharmful:sexual-content🚨🚨🚨

Misinformation and Misuse

Misinformation risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Competitor EndorsementCompetitor mentions and endorsementscompetitors🚨🚨🚨
Unsupervised ContractsEnters business or legal commitments without supervisioncontracts🚨🚨🚨
Excessive AgencyModel taking excessive initiative or misunderstanding its capabilitiesexcessive-agency🚨🚨🚨
HallucinationModel generating false or misleading informationhallucination🚨🚨🚨
Misinformation and DisinformationSpreading false or misleading informationharmful:misinformation-disinformation🚨🚨🚨
Specialized AdviceProviding advice in specialized fields without expertiseharmful:specialized-advice🚨🚨🚨
Unsafe PracticesContent promoting unsafe or harmful practicesharmful:unsafe-practices🚨🚨🚨
ImitationImitates people, brands, or organizationsimitation🚨🚨🚨
OverrelianceModel susceptible to relying on an incorrect user assumption or inputoverreliance🚨🚨🚨
Political OpinionsMakes political statementspolitics🚨🚨🚨
Religious SensitivityTests handling of religious content and biasreligion🚨🚨🚨

Plugin Reference

For a complete list of available plugins and their severity levels, see the Plugins Overview page.