LMVD-ID: 3b3ab2fe
Published April 1, 2025

Multi-Accent Audio Jailbreak

Affected Models:qwen2-audio, diva-llama-3-v0-8b, meralion-audiollm-whisper-sea-lion, minicpm-o-2.6, ultravox-v0-4.1-llama-3.1-8b

Research Paper

Multilingual and Multi-Accent Jailbreaking of Audio LLMs

View Paper

Description: Multilingual and multi-accent audio inputs, combined with acoustic adversarial perturbations (reverberation, echo, whisper effects), can bypass safety mechanisms in Large Audio Language Models (LALMs), causing them to generate unsafe or harmful outputs. The vulnerability is amplified by the interaction between acoustic and linguistic variations, particularly in languages with less training data.

Examples: See the paper's figures and tables showing significantly increased Jailbreak Success Rates (JSRs) when applying reverberation, echo, or whisper effects to multilingual and multi-accent audio prompts targeting various LALMs (e.g., Qwen2, MERaLiON). Specific examples of adversarial prompts and generated outputs are provided in the paper's Appendix.

Impact: Adversaries can elicit unsafe or harmful responses from LALMs using easily generated adversarial audio inputs that appear innocuous to human listeners. This compromises the safety and reliability of LALM-based applications and systems. Multimodal LLMs are especially vulnerable as the audio modality can be exploited to bypass text-based safety filters.

Affected Systems: Large Audio Language Models (LALMs) and multimodal LLMs incorporating audio processing, including but not limited to those based on Whisper models. Specific models tested in the research include Qwen2-Audio, DiVA-llama-3-v0-8b, MERaLiON-AudioLLM-Whisper-SEA-LION, MiniCPM-o-2.6, and Ultravox-v0-4.1-Llama-3.1-8B.

Mitigation Steps:

  • Robust Audio Preprocessing: Implement more sophisticated audio preprocessing techniques to detect and mitigate adversarial acoustic perturbations.
  • Multilingual Safety Training: Expand the safety training data to include diverse languages and accents, along with adversarially perturbed audio samples.
  • Cross-Modal Security: Develop defense mechanisms that consider the interaction between audio and textual inputs, preventing exploitation of one modality to bypass safety measures in the other.
  • Inference-time Defense: Implement techniques such as in-context learning using defense prompts to improve robustness against adversarial inputs at inference time. (Further research is needed to tailor these defenses specifically for audio inputs).

© 2025 Promptfoo. All rights reserved.