Red Team Strategies
Overview
Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates (ASR).
For example, a plugin might generate a harmful input, and a strategy like jailbreak
would then attempt multiple variations of that input to bypass guardrails and content filters.
Strategies are applied during redteam generation (before evaluation) and can significantly increase the Attack Success Rate (ASR) of adversarial inputs. By default, promptfoo will run the original unmodified input as well as the adversarial input generated by the strategy.
Available Strategies
Category | Strategy | Description | Details | Cost | ASR Increase* |
---|---|---|---|---|---|
Static (Single-Turn) | ASCII Smuggling | Unicode tag-based instruction smuggling | Tests system resilience against Unicode tag-based instruction smuggling attacks that can bypass content filters and security controls | Low | 20-30% |
Base64 | Base64 encoding bypass | Tests detection and handling of Base64-encoded malicious payloads to bypass content filters | Low | 20-30% | |
Leetspeak | Character substitution | Tests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special characters | Low | 20-30% | |
ROT13 | Letter rotation encoding | Tests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabet | Low | 20-30% | |
Prompt Injection | Direct system prompts | Tests common direct prompt injection vulnerabilities using a curated list of injection techniques | Low | 20-30% | |
Multilingual | Cross-language testing | Tests handling of inputs across multiple languages, focusing on low-resource languages that may bypass content filters | Low | 30-40% | |
Dynamic (Single-Turn) | Math Prompt | Mathematical encoding | Tests resilience against mathematical notation-based attacks using set theory and abstract algebra | Medium | 40-60% |
Citation | Academic framing | Tests vulnerability to academic authority bias by framing harmful requests in research contexts | Medium | 40-60% | |
Composite JailbreaksRecommended | Combined techniques | Chains multiple jailbreak techniques from research papers to create more sophisticated attacks | Medium | 60-80% | |
JailbreakRecommended | Lightweight iterative refinement | Uses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controls | High | 60-80% | |
Tree-based | Branching attack paths | Creates a tree of attack variations based on the Tree of Attacks research paper | High | 60-80% | |
Best-of-N | Parallel sampling attack | Tests multiple variations in parallel using the Best-of-N technique from Anthropic research | High | 40-60% | |
Multi-turn | GOAT | Gradual escalation | Uses a Generative Offensive Agent Tester to dynamically generate multi-turn conversations | High | 70-90% |
Crescendo | Gradual escalation | Gradually escalates prompt harm over multiple turns while using backtracking to optimize attack paths | High | 70-90% | |
Basic | Basic | Plugin-generated test cases | Controls whether original plugin-generated test cases are included without any strategies applied | Low | None |
Strategy Categories
Static Strategies
Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64
strategy encodes inputs as base64 to bypass guardrails and other content filters. prompt-injection
wraps the payload in a prompt injection such as ignore previous instructions and {{original_adversarial_input}}
.
Dynamic Strategies
Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive. By default, promptfoo recommends two dynamic strategies: jailbreak
and jailbreak:composite
to run on your red-teams.
By default, dynamic strategies like jailbreak
and jailbreak:composite
will:
- Make multiple attempts to bypass the target's security controls
- Stop after exhausting the configured token budget
- Stop early if they successfully generate a harmful output
- Track token usage to prevent runaway costs
Multi-turn Strategies
Multi-turn strategies also use an attacker agent to coerce the target model into generating harmful outputs. These strategies are particularly effective against stateful applications where they can convince the target model to act against its purpose over time. You should run these strategies if you are testing a multi-turn application (such as a chatbot). Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.
All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.
Strategy Selection
Choose strategies based on your application architecture and security requirements:
Single-turn Applications
Single-turn applications process each request independently, creating distinct security boundaries:
Security Properties:
- ✅ Clean context for each request
- ✅ No state manipulation vectors
- ✅ Predictable attack surface
- ❌ Limited threat pattern detection
- ❌ No persistent security context
Recommended Strategies:
redteam:
strategies:
- jailbreak
- jailbreak:composite
Multi-turn Applications
Multi-turn applications maintain conversation state, introducing additional attack surfaces:
Security Properties:
- ✅ Context-aware security checks
- ✅ Pattern detection capability
- ✅ Sophisticated auth flows
- ❌ State manipulation risks
- ❌ Context pollution vectors
- ❌ Increased attack surface
Recommended Strategies:
redteam:
strategies:
- goat
- crescendo
Implementation Guide
Basic Configuration
redteam:
strategies:
- jailbreak # string syntax
- id: jailbreak:composite # object syntax
Advanced Configuration
Some strategies allow you to specify options in the configuration object. For example, the multilingual
strategy allows you to specify the languages to use.
redteam:
strategies:
- id: multilingual
config:
languages:
- french
- zh-CN # Chinese (IETF)
- de # German (ISO 639-1)
Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins
option in the strategy which will only apply the strategy to the specified plugins.
redteam:
strategies:
- id: jailbreak:tree
config:
plugins:
- harmful:hate
Custom Strategies
For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.
Next Steps
- Review LLM Vulnerabilities
- Set up your first test suite