Skip to main content

Types of LLM vulnerabilities

This page documents categories of potential LLM vulnerabilities and failure modes.

Each vulnerability type is supported Promptfoo's open-source plugins. Plugins are a modular system for testing risks and vulnerabilities in LLM models and applications. See the quickstart guide to run your first red team.

LLM vulnerability types

See also our specific guides on:

Vulnerability Types

Security Vulnerabilities

NameDescriptionPlugin ID
ASCII SmugglingAttempts to obfuscate malicious content using ASCII smugglingascii-smuggling
BeaverTailsUses the BeaverTails prompt injection datasetbeavertails
CCASimulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history.cca
Cross-Session LeakChecks for information sharing between unrelated sessionscross-session-leak
CyberSecEvalTests prompt injection attacks using the CyberSecEval datasetcyberseceval
Debug AccessAttempts to access or use debugging commandsdebug-access
Divergent RepetitionTests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.divergent-repetition
DoNotAnswerTests how well LLMs handle harmful queries using the DoNotAnswer datasetdonotanswer
HarmbenchTests prompt injection attacks using the Harmbench datasetharmbench
HijackingUnauthorized or off-topic resource usehijacking
Indirect Prompt InjectionTests if the prompt is vulnerable to instructions injected into variables in the promptindirect-prompt-injection
Malicious CodeTests creation of malicious codeharmful:cybercrime:malicious-code
Malicious Resource FetchingServer-Side Request Forgery (SSRF) testsssrf
Memory PoisoningTests whether an agent is vulnerable to memory poisoning attacksagentic:memory-poisoning
Model Context ProtocolTests for vulnerabilities to Model Context Protocol (MCP) attacksmcp
PlinyTests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4Spliny
Privilege EscalationBroken Function Level Authorization (BFLA) testsbfla
Prompt ExtractionAttempts to get the model to reveal its system promptprompt-extraction
RAG Document ExfiltrationContent related to RAG Document Exfiltrationrag-document-exfiltration
RAG PoisoningTests resistance against poisoning attacks on RAG retrieval systemsrag-poisoning
RBAC EnforcementTests whether the model properly implements Role-Based Access Control (RBAC)rbac
Reasoning DoSTests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models.reasoning-dos
Shell InjectionAttempts to execute shell commands through the modelshell-injection
Special Token InjectionTests for special token injection vulnerabilities using conversation format delimitersspecial-token-injection
SQL InjectionAttempts to perform SQL injection attacks to manipulate database queriessql-injection
System Prompt OverrideTests if an AI system can be manipulated to ignore or override its original system promptsystem-prompt-override
Tool DiscoveryTests if an AI system reveals the list of tools, functions, or API calls it has access totool-discovery
Unauthorized Data AccessBroken Object Level Authorization (BOLA) testsbola
XSTestTests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretationsxstest

Privacy Vulnerabilities

NameDescriptionPlugin ID
COPPATests compliance with children's privacy regulations (COPPA)coppa
Direct PII ExposureDirect exposure of PIIpii:direct
FERPATests compliance with student educational privacy regulations (FERPA)ferpa
PII in API/DatabasePII exposed through API or databasepii:api-db
PII in Session DataPII exposed in session datapii:session
PII via Social EngineeringPII exposed through social engineeringpii:social
Privacy ViolationContent violating privacy rightsharmful:privacy

Criminal Activity

NameDescriptionPlugin ID
Chemical & Biological WeaponsContent related to chemical or biological weaponsharmful:chemical-biological-weapons
Child ExploitationContent exploiting or harming childrenharmful:child-exploitation
Copyright ViolationsContent violating copyright lawsharmful:copyright-violations
CybercrimeContent related to cybercriminal activitiesharmful:cybercrime
Illegal ActivitiesContent promoting illegal activitiesharmful:illegal-activities
Illegal DrugsContent related to illegal drug use or tradeharmful:illegal-drugs
Indiscriminate WeaponsContent related to weapons without contextharmful:indiscriminate-weapons
Intellectual Property ViolationContent violating intellectual property rightsharmful:intellectual-property
Non-Violent CrimeContent related to non-violent criminal activitiesharmful:non-violent-crime
Sex CrimesContent related to sexual crimesharmful:sex-crime
Violent CrimeContent related to violent criminal activitiesharmful:violent-crime

Harmful Activity

NameDescriptionPlugin ID
AegisEvaluates model safety responses using the NVIDIA Aegis datasetaegis
Graphic ContentDisplaying graphic or violent contentharmful:graphic-content
Harassment and BullyingContent that harasses or bullies individualsharmful:harassment-bullying
Hate SpeechContent that promotes hate or discriminationharmful:hate
IEDTests handling of Improvised Explosive Devicesharmful:weapons:ied
InsultsContent that insults or demeans individualsharmful:insults
MethamphetamineTests handling of methamphetamine-related contentharmful:illegal-drugs:meth
ProfanityContent containing profane or inappropriate languageharmful:profanity
RadicalizationContent that promotes radical or extremist viewsharmful:radicalization
Self HarmContent that encourages self-harm or suicideharmful:self-harm
Sexual ContentExplicit or inappropriate sexual contentharmful:sexual-content
ToxicChatTests handling of toxic user prompts from the ToxicChat datasettoxic-chat
UnsafeBenchTests handling of unsafe image content through multi-modal model evaluationunsafebench
VLGuardTests handling of potentially unsafe image content using the VLGuard datasetvlguard
WordplayTests whether AI systems can be tricked into generating profanity through innocent-seeming wordplaywordplay

Misinformation and Misuse

NameDescriptionPlugin ID
Competitor EndorsementCompetitor mentions and endorsementscompetitors
Excessive AgencyModel taking excessive initiative or misunderstanding its capabilitiesexcessive-agency
Goal MisalignmentTests whether AI systems recognize when optimizing measurable proxy metrics might not align with true underlying objectives (Goodhart's Law)goal-misalignment
HallucinationModel generating false or misleading informationhallucination
ImitationImitates people, brands, or organizationsimitation
Misinformation and DisinformationSpreading false or misleading informationharmful:misinformation-disinformation
Off-Topic ManipulationTests whether AI systems can be manipulated to go off-topic from their intended purposeoff-topic
OverrelianceModel susceptible to relying on an incorrect user assumption or inputoverreliance
Political OpinionsMakes political statementspolitics
Religious SensitivityTests handling of religious content and biasreligion
Specialized AdviceProviding advice in specialized fields without expertiseharmful:specialized-advice
Unsafe PracticesContent promoting unsafe or harmful practicesharmful:unsafe-practices
Unsupervised ContractsEnters business or legal commitments without supervisioncontracts
Unverifiable ClaimsTests whether AI systems make claims that cannot be verified even in principleunverifiable-claims

Bias

NameDescriptionPlugin ID
Age BiasTests handling of age bias, stereotypes, and discriminationbias:age
Disability BiasTests handling of disability bias, stereotypes, and discriminationbias:disability
Gender BiasTests handling of gender bias, stereotypes, and discriminationbias:gender
Race BiasTests handling of racial bias, stereotypes, and discriminationbias:race

Ecommerce

NameDescriptionPlugin ID
E-commerce Compliance BypassTests for bypassing age, geographic, and regulatory restrictions on e-commerce productsecommerce:compliance-bypass
E-commerce Order FraudTests for order fraud, transaction manipulation, and return/refund policy abuseecommerce:order-fraud
E-commerce PCI DSSTests for PCI DSS violations and payment card data exposureecommerce:pci-dss
E-commerce Price ManipulationTests for unauthorized pricing changes, discount abuse, and promotional policy violationsecommerce:price-manipulation

Financial

NameDescriptionPlugin ID
Financial Calculation ErrorTests for errors in financial calculations, risk assessments, or quantitative analysisfinancial:calculation-error
Financial Compliance ViolationTests for suggestions that violate securities laws, enable insider trading, or circumvent regulationsfinancial:compliance-violation
Financial Confidential DisclosureTests for inappropriate disclosure of MNPI, proprietary strategies, or confidential financial datafinancial:confidential-disclosure
Financial Counterfactual NarrativeTests for false financial narratives or misleading market informationfinancial:counterfactual
Financial Data LeakageTests for exposure of proprietary trading strategies or confidential financial datafinancial:data-leakage
Financial DefamationTests for false statements damaging financial entity reputationsfinancial:defamation
Financial HallucinationTests for fabricated market data, non-existent financial instruments, or fictional company informationfinancial:hallucination
Financial Services ImpartialityTests for unauthorized financial advice or transaction facilitationfinancial:impartiality
Financial Services MisconductTests for facilitation of financial crimes or market manipulationfinancial:misconduct
Financial SycophancyTests for agreeing with risky investment strategies or validating get-rich-quick schemesfinancial:sycophancy

Medical

NameDescriptionPlugin ID
Medical Anchoring BiasTests for medical anchoring bias where AI fixates on irrelevant informationmedical:anchoring-bias
Medical HallucinationTests for fabricated medical facts, studies, or drug interactionsmedical:hallucination
Medical Incorrect KnowledgeTests for factually wrong medical information that could cause harmmedical:incorrect-knowledge
Medical Off-Label UseTests for inappropriate off-label medication recommendations without proper disclaimersmedical:off-label-use
Medical Prioritization ErrorTests for poor medical prioritization and triage decisionsmedical:prioritization-error
Medical SycophancyTests for agreeing with incorrect medical assumptions to be helpfulmedical:sycophancy

Pharmacy

NameDescriptionPlugin ID
Controlled Substance ComplianceTests adherence to DEA regulations for controlled substance prescribing, dispensing, and monitoringpharmacy:controlled-substance-compliance
Dosage CalculationTests accuracy of pharmaceutical dosage calculations including weight-based dosing and renal adjustmentspharmacy:dosage-calculation
Drug Interaction DetectionTests identification of dangerous drug-drug, drug-food, and drug-supplement interactionspharmacy:drug-interaction

Insurance

NameDescriptionPlugin ID
Coverage DiscriminationTests for discriminatory coverage decisions based on protected characteristics violating ADA, Section 1557, and GINAinsurance:coverage-discrimination
Network MisinformationTests accuracy of provider network information to prevent surprise medical bills and balance billinginsurance:network-misinformation
PHI DisclosureTests whether AI systems properly protect Protected Health Information (PHI) and comply with HIPAA privacy requirementsinsurance:phi-disclosure

Custom

NameDescriptionPlugin ID
Custom PromptsProbes the model with specific inputsintent
Custom TopicViolates a custom configured policypolicy

Vulnerabilities by Application

Not all applications are vulnerable to certain types of exploits. Some vulnerabilities won't apply because of the LLM application's architecture. For example, a single-tenant chatbot without multiple user roles won't be vulnerable to broken access control vulnerabilities. Select a category below to see where vulnerabilities may not apply.

Security Vulnerabilities

Technical security risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Memory PoisoningTests whether an agent is vulnerable to memory poisoning attacksagentic:memory-poisoning🚨
ASCII SmugglingAttempts to obfuscate malicious content using ASCII smugglingascii-smuggling🚨🚨🚨
BeaverTailsUses the BeaverTails prompt injection datasetbeavertails🚨🚨🚨
Privilege EscalationBroken Function Level Authorization (BFLA) testsbfla🚨🚨
Unauthorized Data AccessBroken Object Level Authorization (BOLA) testsbola🚨🚨
CCASimulates Context Compliance Attacks to test whether an AI system can be tricked into generating restricted content using manipulated chat history.cca🚨🚨
Cross-Session LeakChecks for information sharing between unrelated sessionscross-session-leak🚨🚨🚨
CyberSecEvalTests prompt injection attacks using the CyberSecEval datasetcyberseceval🚨🚨🚨
HarmbenchTests prompt injection attacks using the Harmbench datasetharmbench🚨🚨🚨
Debug AccessAttempts to access or use debugging commandsdebug-access🚨🚨
Divergent RepetitionTests whether an AI system can be manipulated into revealing its training data through repetitive pattern exploitation.divergent-repetition🚨🚨🚨
DoNotAnswerTests how well LLMs handle harmful queries using the DoNotAnswer datasetdonotanswer🚨🚨🚨
Malicious CodeTests creation of malicious codeharmful:cybercrime:malicious-code🚨🚨🚨
HijackingUnauthorized or off-topic resource usehijacking🚨🚨🚨
Indirect Prompt InjectionTests if the prompt is vulnerable to instructions injected into variables in the promptindirect-prompt-injection🚨🚨
Model Context ProtocolTests for vulnerabilities to Model Context Protocol (MCP) attacksmcp🚨🚨🚨
PlinyTests LLM systems using a curated collection of prompts from https://github.com/elder-plinius/L1B3RT4Spliny🚨🚨🚨
Prompt ExtractionAttempts to get the model to reveal its system promptprompt-extraction🚨🚨🚨
RAG Document ExfiltrationContent related to RAG Document Exfiltrationrag-document-exfiltration🚨🚨🚨
RAG PoisoningTests resistance against poisoning attacks on RAG retrieval systemsrag-poisoning🚨🚨🚨
RBAC EnforcementTests whether the model properly implements Role-Based Access Control (RBAC)rbac🚨🚨
Reasoning DoSTests for vulnerability to computational resource exhaustion through excessive reasoning patterns. Applicable to reasoning models.reasoning-dos🚨🚨🚨
Shell InjectionAttempts to execute shell commands through the modelshell-injection🚨🚨
Special Token InjectionTests for special token injection vulnerabilities using conversation format delimitersspecial-token-injection🚨🚨🚨
SQL InjectionAttempts to perform SQL injection attacks to manipulate database queriessql-injection🚨🚨
Malicious Resource FetchingServer-Side Request Forgery (SSRF) testsssrf🚨🚨
System Prompt OverrideTests if an AI system can be manipulated to ignore or override its original system promptsystem-prompt-override🚨🚨
Tool DiscoveryTests if an AI system reveals the list of tools, functions, or API calls it has access totool-discovery🚨🚨🚨
XSTestTests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretationsxstest🚨🚨🚨

Privacy Violations

Privacy risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
COPPATests compliance with children's privacy regulations (COPPA)coppa🚨🚨🚨
FERPATests compliance with student educational privacy regulations (FERPA)ferpa🚨🚨🚨
Privacy ViolationContent violating privacy rightsharmful:privacy🚨🚨🚨
PII in API/DatabasePII exposed through API or databasepii:api-db🚨🚨
Direct PII ExposureDirect exposure of PIIpii:direct🚨🚨
PII in Session DataPII exposed in session datapii:session🚨🚨
PII via Social EngineeringPII exposed through social engineeringpii:social🚨🚨

Criminal Activities

Criminal risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Chemical & Biological WeaponsContent related to chemical or biological weaponsharmful:chemical-biological-weapons🚨🚨🚨
Child ExploitationContent exploiting or harming childrenharmful:child-exploitation🚨🚨🚨
Copyright ViolationsContent violating copyright lawsharmful:copyright-violations🚨🚨🚨
CybercrimeContent related to cybercriminal activitiesharmful:cybercrime🚨🚨🚨
Illegal ActivitiesContent promoting illegal activitiesharmful:illegal-activities🚨🚨🚨
Illegal DrugsContent related to illegal drug use or tradeharmful:illegal-drugs🚨🚨🚨
Indiscriminate WeaponsContent related to weapons without contextharmful:indiscriminate-weapons🚨🚨🚨
Intellectual Property ViolationContent violating intellectual property rightsharmful:intellectual-property🚨🚨🚨
Non-Violent CrimeContent related to non-violent criminal activitiesharmful:non-violent-crime🚨🚨🚨
Sex CrimesContent related to sexual crimesharmful:sex-crime🚨🚨🚨
Violent CrimeContent related to violent criminal activitiesharmful:violent-crime🚨🚨🚨

Harmful Content

Harmful content risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
AegisEvaluates model safety responses using the NVIDIA Aegis datasetaegis🚨🚨🚨
Graphic ContentDisplaying graphic or violent contentharmful:graphic-content🚨🚨🚨
Harassment and BullyingContent that harasses or bullies individualsharmful:harassment-bullying🚨🚨🚨
Hate SpeechContent that promotes hate or discriminationharmful:hate🚨🚨🚨
MethamphetamineTests handling of methamphetamine-related contentharmful:illegal-drugs:meth🚨🚨🚨
InsultsContent that insults or demeans individualsharmful:insults🚨🚨🚨
ProfanityContent containing profane or inappropriate languageharmful:profanity🚨🚨🚨
RadicalizationContent that promotes radical or extremist viewsharmful:radicalization🚨🚨🚨
Self HarmContent that encourages self-harm or suicideharmful:self-harm🚨🚨🚨
Sexual ContentExplicit or inappropriate sexual contentharmful:sexual-content🚨🚨🚨
IEDTests handling of Improvised Explosive Devicesharmful:weapons:ied🚨🚨🚨
ToxicChatTests handling of toxic user prompts from the ToxicChat datasettoxic-chat🚨🚨🚨
UnsafeBenchTests handling of unsafe image content through multi-modal model evaluationunsafebench🚨🚨🚨
VLGuardTests handling of potentially unsafe image content using the VLGuard datasetvlguard🚨🚨🚨
WordplayTests whether AI systems can be tricked into generating profanity through innocent-seeming wordplaywordplay🚨🚨🚨

Misinformation and Misuse

Misinformation risks in application context.

Plugin NameDescriptionPlugin IDRAGAgentChatbot
Competitor EndorsementCompetitor mentions and endorsementscompetitors🚨🚨🚨
Unsupervised ContractsEnters business or legal commitments without supervisioncontracts🚨🚨🚨
Excessive AgencyModel taking excessive initiative or misunderstanding its capabilitiesexcessive-agency🚨🚨🚨
Goal MisalignmentTests whether AI systems recognize when optimizing measurable proxy metrics might not align with true underlying objectives (Goodhart's Law)goal-misalignment🚨🚨🚨
HallucinationModel generating false or misleading informationhallucination🚨🚨🚨
Misinformation and DisinformationSpreading false or misleading informationharmful:misinformation-disinformation🚨🚨🚨
Specialized AdviceProviding advice in specialized fields without expertiseharmful:specialized-advice🚨🚨🚨
Unsafe PracticesContent promoting unsafe or harmful practicesharmful:unsafe-practices🚨🚨🚨
ImitationImitates people, brands, or organizationsimitation🚨🚨🚨
Off-Topic ManipulationTests whether AI systems can be manipulated to go off-topic from their intended purposeoff-topic🚨🚨🚨
OverrelianceModel susceptible to relying on an incorrect user assumption or inputoverreliance🚨🚨🚨
Political OpinionsMakes political statementspolitics🚨🚨🚨
Religious SensitivityTests handling of religious content and biasreligion🚨🚨🚨
Unverifiable ClaimsTests whether AI systems make claims that cannot be verified even in principleunverifiable-claims🚨🚨🚨

Plugin Reference

For a complete list of available plugins and their severity levels, see the Plugins Overview page.