LMVD-ID: f748d63d
Published February 1, 2024

ASCII Art Jailbreak

Affected Models:gpt-3.5, gpt-4, gemini, claude, llama2

Research Paper

Artprompt: Ascii art-based jailbreak attacks against aligned llms

View Paper

Description: Large Language Models (LLMs) exhibit vulnerability to a novel jailbreak attack, "ArtPrompt," which leverages the models' poor ability to recognize ASCII art representations of words. By replacing sensitive words in a prompt with their ASCII art equivalents, the attacker bypasses safety filters designed to prevent the generation of harmful content.

Examples: See the ArtPrompt repository https://github.com/uw-nsl/ArtPrompt for examples demonstrating the attack against GPT-3.5, GPT-4, Gemini, Claude, and Llama2 LLMs. Specific examples include replacing the word "bomb" with its ASCII art representation within a prompt requesting instructions on bomb construction.

Impact: Successful exploitation allows attackers to elicit harmful, unsafe, or otherwise undesired responses from LLMs, bypassing built-in safety mechanisms. This can lead to the generation of illegal instructions, biased content, or other forms of malicious output.

Affected Systems: Various Large Language Models (LLMs), including but not limited to GPT-3.5, GPT-4, Gemini, Claude, and Llama2. The vulnerability arises from the LLM's reliance on semantic interpretation of input, neglecting non-semantic visual cues in ASCII art.

Mitigation Steps:

  • Improve LLM training data to include non-semantic visual cues, such as ASCII art, to enhance model robustness against this type of attack.
  • Develop and implement detection mechanisms that can identify ASCII art used to mask harmful prompts.
  • Enhance safety filters to incorporate multiple forms of input interpretation, including visual analysis, alongside semantic analysis.
  • Consider incorporating defenses such as paraphrase and retokenization, though these are shown to be partially effective.

© 2025 Promptfoo. All rights reserved.