Strategies

📄️ Base64 Encoding

The Base64 Encoding strategy is a simple strategy that tests an AI system's ability to handle and process encoded inputs, potentially bypassing certain content filters or detection mechanisms.

📄️ Basic Strategy

The basic strategy controls whether the original plugin-generated test cases (without any strategies applied) are included in the final output.

📄️ Citation

The Citation strategy is a red teaming technique that uses academic citations and references to potentially bypass an AI system's safety measures.

📄️ Single Turn Composite

The Single Turn Composite strategy combines multiple jailbreak techniques from top research papers to create more sophisticated attacks.

Custom strategies give you full control over how your prompts are modified for adversarial testing. This allows you to create your own red team testing approaches by transforming pre-existing test cases programmatically. Strategies can range from simple jailbreaks to calling external APIs or models.

📄️ GOAT

The GOAT (Generative Offensive Agent Tester) strategy is an advanced automated red teaming technique that uses an "attacker" LLM to dynamically generate multi-turn conversations aimed at bypassing a target model's safety measures.

📄️ Iterative Jailbreaks

The Iterative Jailbreaks strategy is a technique designed to systematically probe and potentially bypass an AI system's constraints by repeatedly refining a single-shot prompt.

📄️ Leetspeak

The Leetspeak strategy is a text obfuscation technique that replaces standard letters with numbers or special characters.

📄️ Math Prompt

The Math Prompt strategy tests an AI system's ability to handle harmful inputs using mathematical concepts like set theory, group theory, and abstract algebra. This technique can bypass content filters designed for natural language threats. Research by Bethany et al. ("Jailbreaking Large Language Models with Symbolic Mathematics") revealed that encoding harmful prompts into mathematical problems can bypass safety mechanisms in large language models (LLMs) with a 73.6% success rate across 13 state-of-the-art LLMs.

📄️ Base64 Encoding

📄️ Basic Strategy

📄️ Citation

📄️ Single Turn Composite

📄️ Custom Strategies

📄️ GOAT

📄️ Iterative Jailbreaks

📄️ Leetspeak

📄️ Math Prompt

📄️ Multi-turn Jailbreaks

📄️ Multilingual

📄️ Prompt Injection

📄️ ROT13 Encoding

📄️ Tree-based Jailbreaks