Red Team Strategies

Overview

Strategies are attack techniques that systematically probe LLM applications for vulnerabilities.

While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.

For example, a plugin might generate a harmful input, and a strategy like jailbreak would then attempt multiple variations of that input to bypass guardrails and content filters.

Strategy Flow

Strategies are applied during redteam generation and can significantly increase the Attack Success Rate (ASR) of adversarial inputs.

Available Strategies

Category	Strategy	Description	Details	Cost	ASR Increase^*
Static (Single-Turn)	Audio Encoding	Text-to-speech encoding bypass	Tests handling of text converted to speech audio and encoded as base64 to potentially bypass text-based content filters	Low	20-30%
	Base64	Base64 encoding bypass	Tests detection and handling of Base64-encoded malicious payloads to bypass content filters	Low	20-30%
	Basic	Plugin-generated test cases	Controls whether original plugin-generated test cases are included without any strategies applied	Low	None
	camelCase	camelCase transformation	Tests handling of text transformed into camelCase (removing spaces and capitalizing words) to potentially bypass content filters	Low	0-5%
	Emoji Smuggling	Variation selector encoding	Tests hiding UTF-8 payloads inside emoji variation selectors to evaluate filter evasion.	Low	0-5%
	Hex	Hex encoding bypass	Tests detection and handling of hex-encoded malicious payloads to bypass content filters	Low	20-30%
	Homoglyph	Unicode confusable characters	Tests detection and handling of text with homoglyphs (visually similar Unicode characters) to bypass content filters	Low	20-30%
	Image Encoding	Text-to-image encoding bypass	Tests handling of text embedded in images and encoded as base64 to potentially bypass text-based content filters	Low	20-30%
	Leetspeak	Character substitution	Tests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special characters	Low	20-30%
	Morse Code	Dots and dashes encoding	Tests handling of text encoded in Morse code (dots and dashes) to potentially bypass content filters	Low	20-30%
	Multilingual	Cross-language testing	Tests handling of inputs across multiple languages, focusing on low-resource languages that may bypass content filters	Low	30-40%
	Pig Latin	Word transformation encoding	Tests handling of text transformed into Pig Latin (rearranging word parts) to potentially bypass content filters	Low	20-30%
	Prompt Injection	Direct system prompts	Tests common direct prompt injection vulnerabilities using a curated list of injection techniques	Low	20-30%
	ROT13	Letter rotation encoding	Tests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabet	Low	20-30%
	Video Encoding	Text-to-video encoding bypass	Tests handling of text embedded in videos and encoded as base64 to potentially bypass text-based content filters	Low	20-30%
Dynamic (Single-Turn)	Best-of-N	Parallel sampling attack	Tests multiple variations in parallel using the Best-of-N technique from Anthropic research	High	40-60%
	Citation	Academic framing	Tests vulnerability to academic authority bias by framing harmful requests in research contexts	Medium	40-60%
	Composite JailbreaksRecommended	Combined techniques	Chains multiple jailbreak techniques from research papers to create more sophisticated attacks	Medium	60-80%
	GCG	Gradient-based optimization	Implements the Greedy Coordinate Gradient attack method for finding adversarial prompts using gradient-based search techniques	High	0-10%
	JailbreakRecommended	Lightweight iterative refinement	Uses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controls	High	60-80%
	Likert-based Jailbreaks	Academic evaluation framework	Leverages academic evaluation frameworks and Likert scales to frame harmful requests within research contexts	Medium	40-60%
	Math Prompt	Mathematical encoding	Tests resilience against mathematical notation-based attacks using set theory and abstract algebra	Medium	40-60%
	Tree-based	Branching attack paths	Creates a tree of attack variations based on the Tree of Attacks research paper	High	60-80%
Multi-turn	Crescendo	Gradual escalation	Gradually escalates prompt harm over multiple turns while using backtracking to optimize attack paths	High	70-90%
	GOAT	Generative Offensive Agent Tester	Uses a Generative Offensive Agent Tester to dynamically generate multi-turn conversations	High	70-90%
	Mischievous User	Mischievous user conversations	Simulates a multi-turn conversation between a mischievous user and an agent	High	10-20%
	Pandamonium	Dynamic attack generation	Advanced automated red teaming technique that dynamically generates single or multi-turn conversations aimed at bypassing safety measures	High	70-90%
Regression	Retry	Historical failure testing	Automatically incorporates previously failed test cases into your test suite, creating a regression testing system that learns from past failures	Low	50-70%
Custom	Custom Strategies	User-defined transformations	Allows creation of custom red team testing approaches by programmatically transforming test cases using JavaScript	Variable	Variable

* ASR Increase: Relative increase in Attack Success Rate compared to running the same test without any strategy

Strategy Categories

Static Strategies

Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64 strategy encodes inputs as base64 to bypass guardrails and other content filters. prompt-injection wraps the payload in a prompt injection such as ignore previous instructions and {{original_adversarial_input}}.

Dynamic Strategies

Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive. By default, promptfoo recommends two dynamic strategies: jailbreak and jailbreak:composite to run on your red-teams.

By default, dynamic strategies like jailbreak and jailbreak:composite will:

Make multiple attempts to bypass the target's security controls
Stop after exhausting the configured token budget
Stop early if they successfully generate a harmful output
Track token usage to prevent runaway costs

Multi-turn Strategies

Multi-turn strategies also use an attacker agent to coerce the target model into generating harmful outputs. These strategies are particularly effective against stateful applications where they can convince the target model to act against its purpose over time. You should run these strategies if you are testing a multi-turn application (such as a chatbot). Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.

Regression Strategies

Regression strategies help maintain security over time by learning from past failures. For example, the retry strategy automatically incorporates previously failed test cases into your test suite, creating a form of regression testing for LLM behaviors.

note

All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.

Strategy Selection

Choose strategies based on your application architecture and security requirements:

Single-turn Applications

Single-turn applications process each request independently, creating distinct security boundaries:

Security Properties:

✅ Clean context for each request
✅ No state manipulation vectors
✅ Predictable attack surface
❌ Limited threat pattern detection
❌ No persistent security context

Recommended Strategies:

promptfooconfig.yaml
redteam:
  strategies:
    - jailbreak
    - jailbreak:composite

Multi-turn Applications

Multi-turn applications maintain conversation state, introducing additional attack surfaces:

Security Properties:

✅ Context-aware security checks
✅ Pattern detection capability
✅ Sophisticated auth flows
❌ State manipulation risks
❌ Context pollution vectors
❌ Increased attack surface

Recommended Strategies:

promptfooconfig.yaml
redteam:
  strategies:
    - goat
    - crescendo
    - mischievous-user

Implementation Guide

Basic Configuration

promptfooconfig.yaml
redteam:
  strategies:
    - jailbreak # string syntax
    - id: jailbreak:composite # object syntax

Advanced Configuration

Some strategies allow you to specify options in the configuration object. For example, the multilingual strategy allows you to specify the languages to use.

promptfooconfig.yaml
redteam:
  strategies:
    - id: multilingual
      config:
        languages:
          - french
          - zh-CN # Chinese (IETF)
          - de # German (ISO 639-1)

Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins option in the strategy which will only apply the strategy to the specified plugins.

promptfooconfig.yaml
redteam:
  strategies:
    - id: jailbreak:tree
      config:
        plugins:
          - harmful:hate

Custom Strategies

For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.

LLM Vulnerabilities - Understand the types of vulnerabilities strategies can test
Red Team Plugins - Learn about the plugins that generate the base test cases
Custom Strategies - Create your own strategies

Next Steps

Review LLM Vulnerabilities
Set up your first test suite

Overview​

Available Strategies​

Strategy Categories​

Static Strategies​

Dynamic Strategies​

Multi-turn Strategies​

Regression Strategies​

Strategy Selection​

Single-turn Applications​

Multi-turn Applications​

Implementation Guide​

Basic Configuration​

Advanced Configuration​

Custom Strategies​

Related Concepts​

Next Steps​