LMVD-ID: 997e3c57
Published October 1, 2024

Multi-Round LLM Jailbreak

Research Paper

Multi-round jailbreak attack on large language models

View Paper

Description: A multi-round attack against Large Language Models (LLMs) allows bypassing safety mechanisms by iteratively refining prompts to elicit undesired behavior. The attack leverages the LLM's tendency to adjust its response based on preceding interactions, circumventing single-round prompt filtering defenses.

Examples: Unavailable due to paper withdrawal.

Impact: LLMs can be manipulated to generate harmful content, such as hate speech, misinformation, or instructions for illegal activities, despite safety protocols. This undermines trust and safety features implemented in LLM applications.

Affected Systems: All LLMs that employ iterative prompt-response mechanisms and rely solely on single-round prompt filtering for safety.

Mitigation Steps: Unavailable due to paper withdrawal.

© 2025 Promptfoo. All rights reserved.