Contextual Fusion Jailbreak
Research Paper
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
View PaperDescription: Large Language Models (LLMs) are vulnerable to a multi-turn context-based jailbreak attack, termed Context Fusion Attack (CFA). CFA leverages the LLM's ability to understand context in multi-turn dialogues to bypass security mechanisms designed to prevent harmful outputs. The attack involves strategically crafting a series of prompts that build context, subtly introducing malicious keywords, and ultimately triggering the LLM to generate unsafe content. The malicious intent is masked within the seemingly benign multi-turn conversation.
Examples: See the paper "Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles". Specific examples of attack sequences are provided with varying prompts and responses to demonstrate the success rate across various LLMs.
Impact: Successful exploitation of this vulnerability could lead to the generation of harmful content such as: hate speech, instructions for illegal activities, personally identifiable information (PII) disclosure, malicious code generation, and other forms of unsafe output.
Affected Systems: A wide range of LLMs, including both open-source (e.g., Llama 3, Vicuna 1.5, ChatGLM 4, Qwen 2) and closed-source models (e.g., GPT-3.5-turbo, GPT-4) are susceptible. The vulnerability stems from the LLM's architecture and limitations in secure alignment, rather than specific implementations.
Mitigation Steps:
- Enhance LLMs' multi-turn context understanding and security alignment training data with examples of such attacks.
- Develop more robust detection mechanisms for subtle malicious intent embedded within multi-turn dialogues. This might involve analyzing the semantic evolution of the conversation and identifying strategically placed keywords.
- Implement advanced input sanitization and filtering techniques that consider the contextual meaning of keywords rather than focusing on individual words or phrases.
- Improve the ability of LLMs to resist manipulation through techniques like role-playing and scenario assumptions.
© 2025 Promptfoo. All rights reserved.