Chain Vulnerabilities

Security issues in prompt chaining and workflows

Related Vulnerabilities

8 entries

Enterprise Multi-Turn Data Exfiltration

7/28/2025

Large Language Model (LLM) systems integrated with private enterprise data, such as those using Retrieval-Augmented Generation (RAG), are vulnerable to multi-stage prompt inference attacks. An attacker can use a sequence of individually benign-looking queries to incrementally extract confidential information from the LLM's context. Each query appears innocuous in isolation, bypassing safety filters designed to block single malicious prompts. By chaining these queries, the attacker can reconstruct sensitive data from internal documents, emails, or other private sources accessible to the LLM. The attack exploits the conversational context and the model's inability to recognize the cumulative intent of a prolonged, strategic dialogue.

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

Affects: gpt-4, gpt-3, gpt-2, roberta, gemini

Trojan Prompt Chains in Education

7/28/2025

A vulnerability exists in Large Language Models, including GPT-3.5 and GPT-4, where safety guardrails can be bypassed using Trojanized prompt chains within a simulated educational context. An attacker can establish a benign, pedagogical persona (e.g., a curious student) over a multi-turn dialogue. This initial context is then exploited to escalate the conversation toward requests for harmful or restricted information, which the model provides because the session's context is perceived as safe. The vulnerability stems from the moderation system's failure to detect semantic escalation and topic drift within an established conversational context. Two primary methods were identified: Simulated Child Confusion (SCC), which uses a naive persona to ask for dangerous information under a moral frame (e.g., "what not to do"), and Prompt Chain Escalation via Literary Devices (PCELD), which frames harmful concepts as an academic exercise in satire or metaphor.

Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design

Affects: gpt-3.5, gpt-4, bert

Flowchart-based LVLM Jailbreak Attack

3/19/2025

FC-Attack leverages automatically generated flowcharts containing step-by-step descriptions derived or rephrased from harmful queries, combined with a benign textual prompt, to jailbreak Large Vision-Language Models (LVLMs). The vulnerability lies in the model's susceptibility to visual prompts containing harmful information within the flowcharts, thus bypassing safety alignment mechanisms.

FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts

Affects: gemini-1.5 pro, llava-next, qwen2-vl, internvl-2.5, gpt-4o mini, gpt-4o, claude-3.5 sonnet, mistral 7b

Guardrail Bypass Harmful Fine-tuning

3/19/2025

CVE-2024-XXXX

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

Affects: llama3-8b, llama guard2

Agent Action Hijacking

3/19/2025

CVE-2024-XXXX

Towards Action Hijacking of Large Language Model-based Agent

Affects: llama, vicuna, qwen2, alpaca, gpt-3, gpt-4, minilm, m3e, bert

RAG Worm Jailbreak

12/29/2024

Jailbreaking vulnerabilities in Large Language Models (LLMs) used in Retrieval-Augmented Generation (RAG) systems allow escalation of attacks from entity extraction to full document extraction and enable the propagation of self-replicating malicious prompts ("worms") within interconnected RAG applications. Exploitation leverages prompt injection to force the LLM to return retrieved documents or execute malicious actions specified within the prompt.

Unleashing worms and extracting data: Escalating the outcome of attacks against rag-based inference in scale and severity using jailbreaking

Affects: gemini 1.5 flash

GCG Suffix Data Exfiltration

3/24/2025

A Cross-Prompt Injection Attack (XPIA) can be amplified by appending a Greedy Coordinate Gradient (GCG) suffix to the malicious injection. This increases the likelihood that a Large Language Model (LLM) will execute the injected instruction, even in the presence of a user's primary instruction, leading to data exfiltration. The success rate of the attack depends on the LLM's complexity; medium-complexity models show increased vulnerability.

WHITE PAPER: A Brief Exploration of Data Exfiltration using GCG Suffixes

Affects: phi-3-mini, gpt-3.5, gpt-4o, llama2

Model Combination Misuse

1/26/2025

LLMs, even when individually assessed as "safe," can be combined by an adversary to achieve malicious outcomes. This vulnerability exploits the complementary strengths of multiple models—a high-capability model that refuses malicious requests and a low-capability model that does not—through task decomposition. Adversaries can either manually decompose tasks into benign (solved by the high-capability model) and easily-malicious subtasks (solved by the low-capability model) or automate the decomposition process using the weaker model to generate benign subtasks for the stronger model and then utilizing the solutions in-context to achieve the malicious goal.

Adversaries can misuse combinations of safe models

Affects: claude 3 opus, claude 3 sonnet, claude 3 haiku, llama 2 7b-chat, llama 2 13b-chat, llama 2 70b-chat, mistral 7b, mixtral 8x7b, dall-e 3, stable diffusion v1.5, gpt-4