Application Layer Vulnerabilities

Vulnerabilities in the application interface and API implementation

Related Vulnerabilities

90 entries

LLM Suicide Prompt Jailbreak

7/14/2025

Large Language Models (LLMs) employing safety filters designed to prevent generation of content related to self-harm and suicide can be bypassed through multi-step adversarial prompting. By reframing the request as an academic exercise or hypothetical scenario, users can elicit detailed instructions and information that could facilitate self-harm or suicide, despite initially expressing harmful intent. This vulnerability lies in the inadequacy of existing safety filters to consistently recognize and prevent harmful outputs despite shifts in conversational context.

For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

Affects: chat-gpt4o*, chat-gpt4o, perplexityai, gemini flash 2.0, claude 3.7 sonnet, pi ai

Visual Jailbreak via Context Injection

7/14/2025

Multimodal Large Language Models (MLLMs) are vulnerable to visual contextual attacks, where carefully crafted images and accompanying text prompts can bypass safety mechanisms and elicit harmful responses. The vulnerability stems from the MLLM's ability to integrate visual and textual context to generate outputs, allowing attackers to create realistic scenarios that subvert safety filters. Specifically, the attack leverages image-driven context injection to construct deceptive multi-turn conversations that gradually lead the MLLM to produce unsafe responses.

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Affects: gpt-4o, gpt-4o-mini, gemini 2.0-flash, internvl2.5-78b, llava-ov-7b-chat, qwen2.5-vl-72b-instruct

Staged LLM Pipeline Attack

7/14/2025

Large language models (LLMs) protected by multi-stage safeguard pipelines (input and output classifiers) are vulnerable to staged adversarial attacks (STACK). STACK exploits weaknesses in individual components sequentially, combining jailbreaks for each classifier with a jailbreak for the underlying LLM to bypass the entire pipeline. Successful attacks achieve high attack success rates (ASR), even on datasets of particularly harmful queries.

STACK: Adversarial Attacks on LLM Safeguard Pipelines

Affects: claude 4 opus, qwen3-14b, gemma-2-9b, llama-3-8b-instruct, gpt-4-1106-preview, gpt-4o-2024-08-06

Adaptive LLM Jailbreaking Strategy

7/14/2025

Large Language Models (LLMs) are vulnerable to adaptive jailbreaking attacks that exploit their semantic comprehension capabilities. The MEF framework demonstrates that by tailoring attacks to the model's understanding level (Type I or Type II), evasion of input, inference, and output-level defenses is significantly improved. This is achieved through layered semantic mutations and dual-ended encryption techniques, allowing bypass of security measures even in advanced models like GPT-4o.

Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models

Affects: gpt-4o (29 may 2025 release), llama2-7b, llama2-13b

Agent Red-Teaming via Fuzzing

7/14/2025

Large Language Model (LLM) agents are vulnerable to indirect prompt injection attacks through manipulation of external data sources accessed during task execution. Attackers can embed malicious instructions within this external data, causing the LLM agent to perform unintended actions, such as navigating to arbitrary URLs or revealing sensitive information. The vulnerability stems from insufficient sanitization and validation of external data before it's processed by the LLM.

AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

Affects: o3-mini, gpt-4o, gpt-4o-mini, claude-3.5-sonnet, gemini-2-flash-exp, llama3-8b

Asynchronous Audio Jailbreak

5/31/2025

End-to-end Large Audio-Language Models (LALMs) are vulnerable to AudioJailbreak, a novel attack that appends adversarial audio perturbations ("jailbreak audios") to user prompts. These perturbations, even when applied asynchronously and without alignment to the user's speech, can manipulate the LALM's response to generate adversary-desired outputs that bypass safety mechanisms. The attack achieves universality by employing a single perturbation effective across different prompts and robustness to over-the-air transmission by incorporating reverberation effects during perturbation generation. Even with stealth strategies employed to mask malicious intent, the attack remains highly effective.

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

Affects: gpt-4o, funaudiollm, mini-omni, qwen2-audio, speechgpt, mini-omni2, qwen-audio, llasm, llama-omni, salmonn, blsp, ichigo

LLM Judge Prompt Injection

5/31/2025

Large Language Models (LLMs) used for evaluating text quality (LLM-as-a-Judge architectures) are vulnerable to prompt-injection attacks. Maliciously crafted suffixes appended to input text can manipulate the LLM's judgment, causing it to incorrectly favor a predetermined response even if another response is objectively superior. Two attack vectors are identified: Comparative Undermining Attack (CUA), directly targeting the final decision, and Justification Manipulation Attack (JMA), altering the model's generated reasoning.

Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks

Affects: qwen2.5-3b-instruct, falcon3-3b-instruct

LLM Multi-Agent IP Leakage

5/31/2025

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

Affects: gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

LLM Guardrail Evasion

4/21/2025

Large Language Model (LLM) guardrail systems, including those relying on AI-driven text classification models (e.g., fine-tuned BERT models), are vulnerable to evasion via character injection and adversarial machine learning (AML) techniques. Attackers can bypass detection by injecting Unicode characters (e.g., zero-width characters, homoglyphs) or using AML to subtly perturb prompts, maintaining semantic meaning while evading classification. This allows malicious prompts and jailbreaks to reach the underlying LLM.

Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails

Affects: deberta-v3-base, mdeberta-v3-base, gpt-4-mini

Prefill-Based LLM Jailbreak

5/4/2025

Large Language Models (LLMs) with user-controlled response prefilling features are vulnerable to a novel jailbreak attack. By manipulating the prefilled text, attackers can influence the model's subsequent token generation, bypassing safety mechanisms and eliciting harmful or unintended outputs. Two attack vectors are demonstrated: Static Prefilling (SP), using a fixed prefill string, and Optimized Prefilling (OP), iteratively optimizing the prefill string for maximum impact. The vulnerability lies in the LLM's reliance on the prefilled text as context for generating the response.

Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary

Affects: deepseek v3, gemini 2.0 flash (gemini-2.0-flash-001), gemini 2.0 pro experimental (gemini-2.0-pro-exp-02-05), gpt-3.5 turbo, claude 3.7 sonnet (claude-3-7-sonnet-20250219), claude 3.5 sonnet (claude-3-5-sonnet-20241022)

Page 1 of 9