Skip to main content

Red Team Strategies

Strategies are attack techniques that systematically probe LLM applications for vulnerabilities. While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.

Strategy Flow

Most users only need two strategies for comprehensive coverage. These agentic methods provide the highest attack success rates across use cases.

Meta Agent: Best for Single-Turn

The Meta Agent dynamically builds an attack taxonomy and learns from attack history to optimize bypass attempts. It learns which attack types work best against your specific target.

Hydra Multi-Turn: Best for Multi-Turn

Hydra runs adaptive multi-turn conversations with persistent scan-wide memory. It pivots across conversation branches to uncover hidden vulnerabilities, especially in stateful applications like chatbots and agents.

Quick Start

For most applications, this configuration provides comprehensive red team coverage:

promptfooconfig.yaml
redteam:
strategies:
- jailbreak:meta # Single-turn agentic attacks
- jailbreak:hydra # Multi-turn adaptive conversations

All Strategies

CategoryStrategyDescriptionDetailsCostASR Increase*
Static (Single-Turn)Audio EncodingText-to-speech encoding bypass 🌐Tests handling of text converted to speech audio and encoded as base64 to potentially bypass text-based content filtersLow20-30%
Base64Base64 encoding bypassTests detection and handling of Base64-encoded malicious payloads to bypass content filtersLow20-30%
BasicPlugin-generated test casesControls whether original plugin-generated test cases are included without any strategies appliedLowNone
camelCasecamelCase transformationTests handling of text transformed into camelCase (removing spaces and capitalizing words) to potentially bypass content filtersLow0-5%
Emoji SmugglingVariation selector encodingTests hiding UTF-8 payloads inside emoji variation selectors to evaluate filter evasion.Low0-5%
HexHex encoding bypassTests detection and handling of hex-encoded malicious payloads to bypass content filtersLow20-30%
HomoglyphUnicode confusable charactersTests detection and handling of text with homoglyphs (visually similar Unicode characters) to bypass content filtersLow20-30%
Image EncodingText-to-image encoding bypassTests handling of text embedded in images and encoded as base64 to potentially bypass text-based content filtersLow20-30%
Jailbreak TemplatesStatic jailbreak templatesTests LLM resistance to known jailbreak techniques (DAN, Skeleton Key, etc.) using static templates. Note: Does not cover modern prompt injection techniques.Low20-30%
LeetspeakCharacter substitutionTests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special charactersLow20-30%
Morse CodeDots and dashes encodingTests handling of text encoded in Morse code (dots and dashes) to potentially bypass content filtersLow20-30%
Pig LatinWord transformation encodingTests handling of text transformed into Pig Latin (rearranging word parts) to potentially bypass content filtersLow20-30%
ROT13Letter rotation encodingTests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabetLow20-30%
Video EncodingText-to-video encoding bypassTests handling of text embedded in videos and encoded as base64 to potentially bypass text-based content filtersLow20-30%
Dynamic (Single-Turn)Authoritative Markup InjectionStructured format authority 🌐Tests vulnerability to authoritative formatting by embedding prompts in structured markup that exploits trust in formatted contentMedium40-60%
Best-of-NParallel sampling attack 🌐Tests multiple variations in parallel using the Best-of-N technique from Anthropic researchHigh40-60%
CitationAcademic framing 🌐Tests vulnerability to academic authority bias by framing harmful requests in research contextsMedium40-60%
Composite JailbreaksRecommendedCombined techniques 🌐Chains multiple jailbreak techniques from research papers to create more sophisticated attacksMedium60-80%
GCGGradient-based optimization 🌐Implements the Greedy Coordinate Gradient attack method for finding adversarial prompts using gradient-based search techniquesHigh0-10%
JailbreakRecommendedLightweight iterative refinementUses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controlsHigh60-80%
Likert-based JailbreaksAcademic evaluation framework 🌐Leverages academic evaluation frameworks and Likert scales to frame harmful requests within research contextsMedium40-60%
Math PromptMathematical encodingTests resilience against mathematical notation-based attacks using set theory and abstract algebraMedium40-60%
Meta-Agent JailbreaksRecommendedStrategic taxonomy builder 🌐Builds custom attack taxonomies and learns from all attempts using persistent strategic memory to choose which attack types work against your specific targetHigh70-90%
Tree-basedBranching attack pathsCreates a tree of attack variations based on the Tree of Attacks research paperHigh60-80%
Multi-turnCrescendoGradual escalationGradually escalates prompt harm over multiple turns while using backtracking to optimize attack pathsHigh70-90%
GOATGenerative Offensive Agent Tester 🌐Uses a Generative Offensive Agent Tester to dynamically generate multi-turn conversationsHigh70-90%
Hydra Multi-turnAdaptive multi-turn branching 🌐Adaptive multi-turn jailbreak agent that pivots across branches with persistent scan-wide memory to uncover hidden vulnerabilitiesHigh70-90%
Mischievous UserMischievous user conversationsSimulates a multi-turn conversation between a mischievous user and an agentHigh10-20%
RegressionRetryHistorical failure testingAutomatically incorporates previously failed test cases into your test suite, creating a regression testing system that learns from past failuresLow50-70%
CustomCustom StrategiesUser-defined transformationsAllows creation of custom red team testing approaches by programmatically transforming test cases using JavaScriptVariableVariable
Custom StrategyCustom prompt-based multi-turn strategyWrite natural language instructions to create powerful multi-turn red team strategies. No coding required.VariableVariable
LayerCompose multiple strategiesCompose multiple red team strategies sequentially (e.g., jailbreak → base64) to create sophisticated attack chainsVariableCumulative
* ASR Increase: Relative increase in Attack Success Rate compared to running the same test without any strategy

🌐 indicates that strategy uses remote inference in Promptfoo Community edition

Strategy Categories

Static Strategies

Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64 strategy encodes inputs as base64 to bypass guardrails and other content filters. jailbreak-templates wraps the payload in known jailbreak templates like DAN or Skeleton Key.

Dynamic Strategies

Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive.

By default, dynamic strategies like jailbreak and jailbreak:composite will:

  • Make multiple attempts to bypass the target's security controls
  • Stop after exhausting the configured token budget
  • Stop early if they successfully generate a harmful output
  • Track token usage to prevent runaway costs

Multi-turn Strategies

Multi-turn strategies use an attacker agent to coerce the target over multiple conversation turns. They are particularly effective against stateful applications where they can convince the target to act against its purpose over time. Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.

Indirect Prompt Injection Strategies

Indirect prompt injection strategies test whether AI agents can be manipulated through malicious instructions embedded in external content they consume. These strategies generate realistic attack surfaces containing hidden payloads to test both data exfiltration and behavior manipulation. Currently available: indirect-web-pwn for web browsing agents.

Regression Strategies

Regression strategies help maintain security over time by learning from past failures. For example, the retry strategy automatically incorporates previously failed test cases into your test suite, creating a form of regression testing for LLM behaviors.

note

All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.

Configuration

Basic Configuration

promptfooconfig.yaml
redteam:
strategies:
- jailbreak:meta # string syntax
- id: jailbreak:composite # object syntax

Plugin Targeting

Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins option in the strategy which will only apply the strategy to the specified plugins.

promptfooconfig.yaml
redteam:
strategies:
- id: jailbreak:tree
config:
plugins:
- harmful:hate

Layered Strategies

Chain strategies in order with the layer strategy. This is useful when you want to apply a transformation first, then another technique:

promptfooconfig.yaml
redteam:
strategies:
- id: layer
config:
steps:
- base64 # First encode as base64
- rot13 # Then apply ROT13

Notes:

  • Each step respects plugin targeting and exclusions.
  • Only the final step's outputs are kept.
  • Transformations are applied in the order specified.

Custom Strategies

For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.