Data Privacy Vulnerabilities

Vulnerabilities compromising user or training data privacy

Related Vulnerabilities

21 entries

Activation Steering Leaks PII

7/14/2025

Large Language Models (LLMs) are vulnerable to activation steering attacks that bypass safety and privacy mechanisms. By manipulating internal attention head activations using lightweight linear probes trained on refusal/disclosure behavior, an attacker can induce the model to reveal Personally Identifiable Information (PII) memorized during training, including sensitive attributes like sexual orientation, relationships, and life events. The attack does not require adversarial prompts or auxiliary LLMs; it directly modifies internal model activations.

PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage

Affects: llama 7b, qwen 7b, gemma 9b, glm 9b, gpt-4, gpt-4 mini, llama 2-7b

Enterprise Multi-Turn Data Exfiltration

7/28/2025

Large Language Model (LLM) systems integrated with private enterprise data, such as those using Retrieval-Augmented Generation (RAG), are vulnerable to multi-stage prompt inference attacks. An attacker can use a sequence of individually benign-looking queries to incrementally extract confidential information from the LLM's context. Each query appears innocuous in isolation, bypassing safety filters designed to block single malicious prompts. By chaining these queries, the attacker can reconstruct sensitive data from internal documents, emails, or other private sources accessible to the LLM. The attack exploits the conversational context and the model's inability to recognize the cumulative intent of a prolonged, strategic dialogue.

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

Affects: gpt-4, gpt-3, gpt-2, roberta, gemini

Gradient-Based Privacy Jailbreak

5/31/2025

Large Language Models (LLMs) are vulnerable to a novel privacy jailbreak attack, dubbed PIG (Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization). PIG leverages in-context learning and gradient-based iterative optimization to extract Personally Identifiable Information (PII) from LLMs, bypassing built-in safety mechanisms. The attack iteratively refines a crafted prompt based on gradient information, focusing on tokens related to PII entities, thereby increasing the likelihood of successful PII extraction.

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

Affects: llama2-7b-chat-hf, mistral-7b-instruct-v0.3, llama3-8b-instruct, vicuna-7b-v1.5, gpt-4o, claude 3.5

LLM Multi-Agent IP Leakage

5/31/2025

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

Affects: gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

LLM Censorship Vector Control

5/4/2025

Large Language Models (LLMs) employing safety mechanisms based on supervised fine-tuning and preference alignment exhibit a vulnerability to "steering" attacks. Maliciously crafted prompts or input manipulations can exploit representation vectors within the model to either bypass censorship ("refusal-compliance vector") or suppress the model's reasoning process ("thought suppression vector"), resulting in the generation of unintended or harmful outputs. This vulnerability is demonstrated across several instruction-tuned and reasoning LLMs from various providers.

Steering the CensorShip: Uncovering Representation Vectors for LLM" Thought" Control

Affects: llama-2-7b, qwen-1.8b, qwen-7b, yi-1.5-6b, gemma-2b, gemma-7b, llama-3.1-8b, qwen 2.5-7b, deepseek-r1-distill-qwen-1.5b, deepseek-r1-distill-qwen-7b, deepseek-r1-distill-qwen-32b

Agent Reasoning Hijacking

3/19/2025

A vulnerability exists in Large Language Model (LLM) agents that allows attackers to manipulate the agent's reasoning process through the insertion of strategically placed adversarial strings. This allows attackers to induce the agent to perform unintended malicious actions or invoke specific malicious tools, even when the initial prompt or instruction is benign. The attack exploits the agent's reliance on chain-of-thought reasoning and dynamically optimizes the adversarial string to maximize the likelihood of the agent incorporating malicious actions into its reasoning path.

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

Affects: llama 3.1 8b, mistral 7b, gpt-4o, claude 3, gpt-3.5, gpt-4

Adversarial Tool Injection Attacks

12/29/2024

Large Language Model (LLM) tool-calling systems are vulnerable to adversarial tool injection attacks. Attackers can inject malicious tools ("Manipulator Tools") into the tool platform, manipulating the LLM's tool selection and execution process. This allows for privacy theft (extracting user queries), denial-of-service (DoS) attacks against legitimate tools, and unscheduled tool-calling (forcing the use of attacker-specified tools regardless of relevance). The attack exploits vulnerabilities in the tool retrieval mechanism and the LLM's decision-making process. Successful attacks require the malicious tool to be (1) retrieved by the system, (2) selected for execution by the LLM, and (3) its output to manipulate subsequent LLM actions.

From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection

Affects: gpt-4, llama 3, qwen2

Agent Action Hijacking

3/19/2025

CVE-2024-XXXX

Towards Action Hijacking of Large Language Model-based Agent

Affects: llama, vicuna, qwen2, alpaca, gpt-3, gpt-4, minilm, m3e, bert

RL-Based LLM Privacy Leak

12/28/2024

Large Language Models (LLMs) are vulnerable to a novel agentic-based red-teaming attack, PrivAgent, which uses reinforcement learning to generate adversarial prompts. These prompts can extract sensitive information, including system prompts and portions of training data, from target LLMs even with existing guardrail defenses. The attack leverages a custom reward function based on a normalized sliding-window word edit similarity metric to guide the learning process, enabling it to overcome the limitations of previous fuzzing and genetic approaches.

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage

Agent Tool Misuse Attacks

12/29/2024

Large Language Model (LLM) agents are vulnerable to obfuscated adversarial prompts that exploit tool misuse. These prompts, crafted through prompt optimization techniques, force the agent to execute tools (e.g., URL fetching, markdown rendering) in a way that leaks sensitive user data (e.g., PII) without the user's knowledge. The prompts are designed to be visually indistinguishable from benign prompts.

Imprompter: Tricking LLM Agents into Improper Tool Use

Page 1 of 3