Multimodal Contextual Jailbreak
Research Paper
PiCo: Jailbreaking Multimodal Large Language Models via ctorial de Contextualization
View PaperDescription: Multimodal Large Language Models (MLLMs) are vulnerable to a jailbreaking attack, dubbed PiCo, that leverages token-level typographic attacks on images embedded within code-style instructions. The attack bypasses multi-tiered defense mechanisms, including input filtering and runtime monitoring, by exploiting weaknesses in the visual modality's integration with programming contexts. Harmful intent is concealed within visually benign image fragments and code instructions, circumventing safety protocols.
Examples: (See paper for specific examples; examples are image-based and cannot be easily replicated textually. Replicating requires generating specific images with typographically altered words. See arXiv:XXXX.XXXX for details. Note that arXiv link is a placeholder and needs to be replaced with the actual paper link once available)
Impact: Successful exploitation allows attackers to bypass safety protocols and elicit harmful or unsafe responses from the MLLM, including but not limited to generating content related to violence, financial crime, privacy violation, animal abuse, and self-harm. The attack's success rate varies across models but is shown to be significant in many cases.
Affected Systems: Multimodal Large Language Models (MLLMs), including but not limited to Gemini Pro Vision, GPT-4V, GPT-4o, GPT-4-Turbo, and LLAVA-1.5. The attack is effective against both open-source and closed-source models.
Mitigation Steps:
- Enhance input filtering to detect typographical attacks on images, potentially utilizing techniques beyond simple keyword matching.
- Improve runtime monitoring to detect malicious intent embedded within seemingly benign code instructions.
- Develop more robust defenses against cross-modal attacks, considering the interaction between visual and textual inputs.
- Conduct comprehensive red-teaming exercises to identify and address vulnerabilities in MLLM defenses.
(Note: The arXiv link is a placeholder and should be replaced with the actual paper's link upon publication.)
© 2025 Promptfoo. All rights reserved.