Multimodal Vulnerabilities

Vulnerabilities in multimodal AI systems

Related Vulnerabilities

51 entries

LVLM Vision-Triggered Resource Exhaustion

7/28/2025

A resource consumption vulnerability exists in multiple Large Vision-Language Models (LVLMs). An attacker can craft a subtle, imperceptible adversarial perturbation and apply it to an input image. When this image is processed by an LVLM, even with a benign text prompt, it forces the model into an unbounded generation loop. The attack, named RECALLED, uses a gradient-based optimization process to create a visual perturbation that steers the model's text generation towards a predefined, repetitive sequence (an "Output Recall" target). This causes the model to generate text that repeats a word or sentence until the maximum context limit is reached, leading to a denial-of-service condition through excessive computational resource usage and response latency.

RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models

Affects: llava, qwen-vl, instructblip, qwen3b, qwen7b, qwen32b, llava7b, llava13b, blip7b, blip13b, llava-1.5-hf, qwen/qwen2.5-vl-instruct, instructblip-vicuna, meta-llama

Prompt-Based Jailbreak Taxonomy

8/16/2025

Large Language Models (LLMs) and Text-to-Image (T2I) models are vulnerable to jailbreaking through prompt-based attacks that use narrative framing, semantic substitution, and context diffusion to bypass safety moderation pipelines. These attacks do not require specialized knowledge or technical expertise. Attackers can embed harmful requests within benign narratives, frame them as fictional or professional inquiries, or use euphemistic language to circumvent input filters and output classifiers. The core vulnerability is the models' inability to holistically assess cumulative intent across multi-turn dialogues or recognize malicious intent when it is semantically or stylistically disguised.

Anyone Can Jailbreak: Prompt-Based Attacks on LLMs and T2Is

Affects: gpt-4, claude 2, google’s gemini, meta’s llama 2, mistral, vicuna, qwen, palm 2, chatgpt, gpt-3.5, stable diffusion, chatgpt 4o, claude sonnet 4, qwen3-235b-a22b, grokv 3, deepseek v3, gemini 2.5

Visual Jailbreak via Context Injection

7/14/2025

Multimodal Large Language Models (MLLMs) are vulnerable to visual contextual attacks, where carefully crafted images and accompanying text prompts can bypass safety mechanisms and elicit harmful responses. The vulnerability stems from the MLLM's ability to integrate visual and textual context to generate outputs, allowing attackers to create realistic scenarios that subvert safety filters. Specifically, the attack leverages image-driven context injection to construct deceptive multi-turn conversations that gradually lead the MLLM to produce unsafe responses.

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Affects: gpt-4o, gpt-4o-mini, gemini 2.0-flash, internvl2.5-78b, llava-ov-7b-chat, qwen2.5-vl-72b-instruct

Audio Adversarial Jailbreak

5/31/2025

A vulnerability in SpeechGPT allows bypassing safety filters through adversarial audio prompts crafted by a white-box token-level attack. The attacker leverages knowledge of SpeechGPT's internal speech tokenization process to generate adversarial token sequences, which are then synthesized into audio. These audio prompts elicit restricted or harmful outputs the model would normally suppress. The attack's effectiveness relies on the model's discrete audio token representation and does not require access to model parameters or gradients.

Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework

Code-Mixed Phonetic Attack

9/7/2025

A vulnerability exists in multiple large language and multimodal models that allows for the bypass of safety filters through the use of code-mixed prompts with phonetic perturbations. An attacker can craft a prompt in a code-mixed language (e.g., Hinglish) and apply phonetic misspellings to sensitive keywords (e.g., spelling "hate" as "haet"). This technique causes the model's tokenizer to parse the sensitive word into benign sub-tokens, preventing safety mechanisms from flagging the harmful instruction. The model, however, correctly interprets the semantic meaning of the perturbed prompt and generates the requested harmful content, including text and images.

" Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs

Affects: chatgpt-4o-mini, llama-3-8b-instruct, gemma-1.1-7b-it, mistral-7b-instruct-v0.3, gpt-4o

Dynamic Prompt Jailbreak

5/31/2025

GhostPrompt demonstrates a vulnerability in multimodal safety filters used with text-to-image generative models. The vulnerability allows attackers to bypass these filters by using a dynamic prompt optimization framework that iteratively generates adversarial prompts designed to evade both text-based and image-based safety checks while preserving the original, harmful intent of the prompt. This bypass is achieved through a combination of semantically aligned prompt rewriting and the injection of benign visual cues to confuse image-level filters.

GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

Affects: gpt-3.5, gpt-4.1, dall·e 3, shieldlm-7b, internvl2-2b, deepseek-v3, qwen2.5-7b-instruct, flux.1-schnell

Embodied Agent Jailbreak

5/31/2025

Multimodal Large Language Models (MLLMs) used in Vision-and-Language Navigation (VLN) systems are vulnerable to jailbreak attacks. Adversarially crafted natural language instructions, even when disguised within seemingly benign prompts, can bypass safety mechanisms and cause the VLN agent to perform unintended or harmful actions in both simulated and real-world environments. The attacks exploit the MLLM's ability to follow instructions without sufficient consideration of the consequences of those actions.

BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation

Affects: internvl3-8b, qwen2.5-vl-7b-instruct, llava-v1.6-mistral-7b, gpt-4o, gemini-2.0-flash, gpt-4o-mini

Hidden Image Jailbreak

5/31/2025

Multimodal large language models (MLLMs) are vulnerable to implicit jailbreak attacks that leverage least significant bit (LSB) steganography to conceal malicious instructions within images. These instructions are coupled with seemingly benign image-related text prompts, causing the MLLM to execute the hidden malicious instructions. The attack bypasses existing safety mechanisms by exploiting cross-modal reasoning capabilities.

Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models

Affects: gpt-4o, gemini 1.5 pro, qwen2.5-vl-72b, gpt-4.5, gemini2.5-pro, intervl2-8b

LLM Multi-Agent IP Leakage

5/31/2025

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

Affects: gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

MAD Amplified Jailbreaks

5/4/2025

Multi-Agent Debate (MAD) frameworks leveraging Large Language Models (LLMs) are vulnerable to amplified jailbreak attacks. A novel structured prompt-rewriting technique exploits the iterative dialogue and role-playing dynamics of MAD, circumventing inherent safety mechanisms and significantly increasing the likelihood of generating harmful content. The attack succeeds by using narrative encapsulation, role-driven escalation, iterative refinement, and rhetorical obfuscation to guide agents towards progressively elaborating harmful responses.

Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate

Affects: gpt-4o, gpt-4, gpt-3.5-turbo, deepseek

Page 1 of 6