Red Team Strategies
Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.
Recommended Strategies
Most users only need two strategies for comprehensive coverage. These agentic methods provide the highest attack success rates across use cases.
Meta Agent: Best for Single-Turn
The Meta Agent dynamically builds an attack taxonomy and learns from attack history to optimize bypass attempts. It learns which attack types work best against your specific target.
Hydra Multi-Turn: Best for Multi-Turn
Hydra runs adaptive multi-turn conversations with persistent scan-wide memory. It pivots across conversation branches to uncover hidden vulnerabilities, especially in stateful applications like chatbots and agents.
Quick Start
For most applications, this configuration provides comprehensive red team coverage:
redteam:
strategies:
- jailbreak:meta # Single-turn agentic attacks
- jailbreak:hydra # Multi-turn adaptive conversations
All Strategies
| Category | Strategy | Description | Details | Cost | ASR Increase* |
|---|---|---|---|---|---|
| Static (Single-Turn) | Audio Encoding | Text-to-speech encoding bypass 🌐 | Tests handling of text converted to speech audio and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% |
| Base64 | Base64 encoding bypass | Tests detection and handling of Base64-encoded malicious payloads to bypass content filters | Low | 20-30% | |
| Basic | Plugin-generated test cases | Controls whether original plugin-generated test cases are included without any strategies applied | Low | None | |
| camelCase | camelCase transformation | Tests handling of text transformed into camelCase (removing spaces and capitalizing words) to potentially bypass content filters | Low | 0-5% | |
| Emoji Smuggling | Variation selector encoding | Tests hiding UTF-8 payloads inside emoji variation selectors to evaluate filter evasion. | Low | 0-5% | |
| Hex | Hex encoding bypass | Tests detection and handling of hex-encoded malicious payloads to bypass content filters | Low | 20-30% | |
| Homoglyph | Unicode confusable characters | Tests detection and handling of text with homoglyphs (visually similar Unicode characters) to bypass content filters | Low | 20-30% | |
| Image Encoding | Text-to-image encoding bypass | Tests handling of text embedded in images and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% | |
| Jailbreak Templates | Static jailbreak templates | Tests LLM resistance to known jailbreak techniques (DAN, Skeleton Key, etc.) using static templates. Note: Does not cover modern prompt injection techniques. | Low | 20-30% | |
| Leetspeak | Character substitution | Tests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special characters | Low | 20-30% | |
| Morse Code | Dots and dashes encoding | Tests handling of text encoded in Morse code (dots and dashes) to potentially bypass content filters | Low | 20-30% | |
| Pig Latin | Word transformation encoding | Tests handling of text transformed into Pig Latin (rearranging word parts) to potentially bypass content filters | Low | 20-30% | |
| ROT13 | Letter rotation encoding | Tests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabet | Low | 20-30% | |
| Video Encoding | Text-to-video encoding bypass | Tests handling of text embedded in videos and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% | |
| Dynamic (Single-Turn) | Authoritative Markup Injection | Structured format authority 🌐 | Tests vulnerability to authoritative formatting by embedding prompts in structured markup that exploits trust in formatted content | Medium | 40-60% |
| Best-of-N | Parallel sampling attack 🌐 | Tests multiple variations in parallel using the Best-of-N technique from Anthropic research | High | 40-60% | |
| Citation | Academic framing 🌐 | Tests vulnerability to academic authority bias by framing harmful requests in research contexts | Medium | 40-60% | |
| Composite JailbreaksRecommended | Combined techniques 🌐 | Chains multiple jailbreak techniques from research papers to create more sophisticated attacks | Medium | 60-80% | |
| GCG | Gradient-based optimization 🌐 | Implements the Greedy Coordinate Gradient attack method for finding adversarial prompts using gradient-based search techniques | High | 0-10% | |
| JailbreakRecommended | Lightweight iterative refinement | Uses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controls | High | 60-80% | |
| Likert-based Jailbreaks | Academic evaluation framework 🌐 | Leverages academic evaluation frameworks and Likert scales to frame harmful requests within research contexts | Medium | 40-60% | |
| Math Prompt | Mathematical encoding | Tests resilience against mathematical notation-based attacks using set theory and abstract algebra | Medium | 40-60% | |
| Meta-Agent JailbreaksRecommended | Strategic taxonomy builder 🌐 | Builds custom attack taxonomies and learns from all attempts using persistent strategic memory to choose which attack types work against your specific target | High | 70-90% | |
| Tree-based | Branching attack paths | Creates a tree of attack variations based on the Tree of Attacks research paper | High | 60-80% | |
| Multi-turn | Crescendo | Gradual escalation | Gradually escalates prompt harm over multiple turns while using backtracking to optimize attack paths | High | 70-90% |
| GOAT | Generative Offensive Agent Tester 🌐 | Uses a Generative Offensive Agent Tester to dynamically generate multi-turn conversations | High | 70-90% | |
| Hydra Multi-turn | Adaptive multi-turn branching 🌐 | Adaptive multi-turn jailbreak agent that pivots across branches with persistent scan-wide memory to uncover hidden vulnerabilities | High | 70-90% | |
| Mischievous User | Mischievous user conversations | Simulates a multi-turn conversation between a mischievous user and an agent | High | 10-20% | |
| Regression | Retry | Historical failure testing | Automatically incorporates previously failed test cases into your test suite, creating a regression testing system that learns from past failures | Low | 50-70% |
| Custom | Custom Strategies | User-defined transformations | Allows creation of custom red team testing approaches by programmatically transforming test cases using JavaScript | Variable | Variable |
| Custom Strategy | Custom prompt-based multi-turn strategy | Write natural language instructions to create powerful multi-turn red team strategies. No coding required. | Variable | Variable | |
| Layer | Compose multiple strategies | Compose multiple red team strategies sequentially (e.g., jailbreak → base64) to create sophisticated attack chains | Variable | Cumulative |
🌐 indicates that strategy uses remote inference in Promptfoo Community edition
Strategy Categories
Static Strategies
Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64 strategy encodes inputs as base64 to bypass guardrails and other content filters. jailbreak-templates wraps the payload in known jailbreak templates like DAN or Skeleton Key.
Dynamic Strategies
Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive.
By default, dynamic strategies like jailbreak and jailbreak:composite will:
- Make multiple attempts to bypass the target's security controls
- Stop after exhausting the configured token budget
- Stop early if they successfully generate a harmful output
- Track token usage to prevent runaway costs
Multi-turn Strategies
Multi-turn strategies use an attacker agent to coerce the target over multiple conversation turns. They are particularly effective against stateful applications where they can convince the target to act against its purpose over time. Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.
Indirect Prompt Injection Strategies
Indirect prompt injection strategies test whether AI agents can be manipulated through malicious instructions embedded in external content they consume. These strategies generate realistic attack surfaces containing hidden payloads to test both data exfiltration and behavior manipulation. Currently available: indirect-web-pwn for web browsing agents.
Regression Strategies
Regression strategies help maintain security over time by learning from past failures. For example, the retry strategy automatically incorporates previously failed test cases into your test suite, creating a form of regression testing for LLM behaviors.
All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.
Configuration
Basic Configuration
redteam:
strategies:
- jailbreak:meta # string syntax
- id: jailbreak:composite # object syntax
Plugin Targeting
Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins option in the strategy which will only apply the strategy to the specified plugins.
redteam:
strategies:
- id: jailbreak:tree
config:
plugins:
- harmful:hate
Layered Strategies
Chain strategies in order with the layer strategy. This is useful when you want to apply a transformation first, then another technique:
redteam:
strategies:
- id: layer
config:
steps:
- base64 # First encode as base64
- rot13 # Then apply ROT13
Notes:
- Each step respects plugin targeting and exclusions.
- Only the final step's outputs are kept.
- Transformations are applied in the order specified.
Custom Strategies
For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.
Related Concepts
- LLM Vulnerabilities - Understand the types of vulnerabilities strategies can test
- Red Team Plugins - Learn about the plugins that generate the base test cases
- Custom Strategies - Create your own strategies