📄️ Base64 Encoding
The Base64 Encoding strategy is a simple strategy that tests an AI system's ability to handle and process encoded inputs, potentially bypassing certain content filters or detection mechanisms.
📄️ Basic Strategy
The basic strategy controls whether the original plugin-generated test cases (without any strategies applied) are included in the final output.
📄️ Citation
The Citation strategy is a red teaming technique that uses academic citations and references to potentially bypass an AI system's safety measures.
📄️ Single Turn Composite
The Single Turn Composite strategy combines multiple jailbreak techniques from top research papers to create more sophisticated attacks.
📄️ Custom Strategies
Custom strategies give you full control over how your prompts are modified for adversarial testing. This allows you to create your own red team testing approaches by transforming pre-existing test cases programmatically. Strategies can range from simple jailbreaks to calling external APIs or models.
📄️ GOAT
The GOAT (Generative Offensive Agent Tester) strategy is an advanced automated red teaming technique that uses an "attacker" LLM to dynamically generate multi-turn conversations aimed at bypassing a target model's safety measures.
📄️ Iterative Jailbreaks
The Iterative Jailbreaks strategy is a technique designed to systematically probe and potentially bypass an AI system's constraints by repeatedly refining a single-shot prompt.
📄️ Leetspeak
The Leetspeak strategy is a text obfuscation technique that replaces standard letters with numbers or special characters.
📄️ Math Prompt
The Math Prompt strategy tests an AI system's ability to handle harmful inputs using mathematical concepts like set theory, group theory, and abstract algebra. This technique can bypass content filters designed for natural language threats. Research by Bethany et al. ("Jailbreaking Large Language Models with Symbolic Mathematics") revealed that encoding harmful prompts into mathematical problems can bypass safety mechanisms in large language models (LLMs) with a 73.6% success rate across 13 state-of-the-art LLMs.
📄️ Multi-turn Jailbreaks
The Crescendo strategy is a multi-turn jailbreak technique that gradually escalates the potential harm of prompts, exploiting the fuzzy boundary between acceptable and unacceptable responses.
📄️ Multilingual
The Multilingual strategy tests an AI system's ability to handle and process inputs in multiple languages, potentially uncovering inconsistencies in behavior across different languages or bypassing language-specific content filters.
📄️ Prompt Injection
The Prompt Injection strategy tests common direct prompt injection vulnerabilities in LLMs.
📄️ ROT13 Encoding
The ROT13 Encoding strategy is a simple letter substitution technique that rotates each letter in the text by 13 positions in the alphabet.
📄️ Tree-based Jailbreaks
The Tree-based Jailbreaks strategy is an advanced technique designed to systematically explore and potentially bypass an AI system's constraints by creating a branching structure of single-shot prompts.