Skip to main content

Red Team Strategies

Overview

Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates (ASR).

For example, a plugin might generate a harmful input, and a strategy like jailbreak would then attempt multiple variations of that input to bypass guardrails and content filters.

Strategies are applied during redteam generation (before evaluation) and can significantly increase the Attack Success Rate (ASR) of adversarial inputs. By default, promptfoo will run the original unmodified input as well as the adversarial input generated by the strategy.

Available Strategies

CategoryStrategyDescriptionDetailsCostASR Increase*
Static (Single-Turn)ASCII SmugglingUnicode tag-based instruction smugglingTests system resilience against Unicode tag-based instruction smuggling attacks that can bypass content filters and security controlsLow20-30%
Base64Base64 encoding bypassTests detection and handling of Base64-encoded malicious payloads to bypass content filtersLow20-30%
LeetspeakCharacter substitutionTests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special charactersLow20-30%
ROT13Letter rotation encodingTests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabetLow20-30%
Prompt InjectionDirect system promptsTests common direct prompt injection vulnerabilities using a curated list of injection techniquesLow20-30%
MultilingualCross-language testingTests handling of inputs across multiple languages, focusing on low-resource languages that may bypass content filtersLow30-40%
Dynamic (Single-Turn)Math PromptMathematical encodingTests resilience against mathematical notation-based attacks using set theory and abstract algebraMedium40-60%
CitationAcademic framingTests vulnerability to academic authority bias by framing harmful requests in research contextsMedium40-60%
Composite JailbreaksRecommendedCombined techniquesChains multiple jailbreak techniques from research papers to create more sophisticated attacksMedium60-80%
JailbreakRecommendedLightweight iterative refinementUses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controlsHigh60-80%
Tree-basedBranching attack pathsCreates a tree of attack variations based on the Tree of Attacks research paperHigh60-80%
Best-of-NParallel sampling attackTests multiple variations in parallel using the Best-of-N technique from Anthropic researchHigh40-60%
Multi-turnGOATGradual escalationUses a Generative Offensive Agent Tester to dynamically generate multi-turn conversationsHigh70-90%
CrescendoGradual escalationGradually escalates prompt harm over multiple turns while using backtracking to optimize attack pathsHigh70-90%
BasicBasicPlugin-generated test casesControls whether original plugin-generated test cases are included without any strategies appliedLowNone
* ASR Increase: Relative increase in Attack Success Rate compared to running the same test without any strategy

Strategy Categories

Static Strategies

Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64 strategy encodes inputs as base64 to bypass guardrails and other content filters. prompt-injection wraps the payload in a prompt injection such as ignore previous instructions and {{original_adversarial_input}}.

Dynamic Strategies

Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive. By default, promptfoo recommends two dynamic strategies: jailbreak and jailbreak:composite to run on your red-teams.

By default, dynamic strategies like jailbreak and jailbreak:composite will:

  • Make multiple attempts to bypass the target's security controls
  • Stop after exhausting the configured token budget
  • Stop early if they successfully generate a harmful output
  • Track token usage to prevent runaway costs

Multi-turn Strategies

Multi-turn strategies also use an attacker agent to coerce the target model into generating harmful outputs. These strategies are particularly effective against stateful applications where they can convince the target model to act against its purpose over time. You should run these strategies if you are testing a multi-turn application (such as a chatbot). Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.

note

All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.

Strategy Selection

Choose strategies based on your application architecture and security requirements:

Single-turn Applications

Single-turn applications process each request independently, creating distinct security boundaries:

Security Properties:

  • ✅ Clean context for each request
  • ✅ No state manipulation vectors
  • ✅ Predictable attack surface
  • ❌ Limited threat pattern detection
  • ❌ No persistent security context

Recommended Strategies:

redteam:
strategies:
- jailbreak
- jailbreak:composite

Multi-turn Applications

Multi-turn applications maintain conversation state, introducing additional attack surfaces:

Security Properties:

  • ✅ Context-aware security checks
  • ✅ Pattern detection capability
  • ✅ Sophisticated auth flows
  • ❌ State manipulation risks
  • ❌ Context pollution vectors
  • ❌ Increased attack surface

Recommended Strategies:

redteam:
strategies:
- goat
- crescendo

Implementation Guide

Basic Configuration

redteam:
strategies:
- jailbreak # string syntax
- id: jailbreak:composite # object syntax

Advanced Configuration

Some strategies allow you to specify options in the configuration object. For example, the multilingual strategy allows you to specify the languages to use.

redteam:
strategies:
- id: multilingual
config:
languages:
- french
- zh-CN # Chinese (IETF)
- de # German (ISO 639-1)

Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins option in the strategy which will only apply the strategy to the specified plugins.

redteam:
strategies:
- id: jailbreak:tree
config:
plugins:
- harmful:hate

Custom Strategies

For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.

Next Steps

  1. Review LLM Vulnerabilities
  2. Set up your first test suite