Vulnerabilities compromising user or training data privacy
A vulnerability exists in Large Language Model (LLM) agents that allows attackers to manipulate the agent's reasoning process through the insertion of strategically placed adversarial strings. This allows attackers to induce the agent to perform unintended malicious actions or invoke specific malicious tools, even when the initial prompt or instruction is benign. The attack exploits the agent's reliance on chain-of-thought reasoning and dynamically optimizes the adversarial string to maximize the likelihood of the agent incorporating malicious actions into its reasoning path.
Large Language Model (LLM) tool-calling systems are vulnerable to adversarial tool injection attacks. Attackers can inject malicious tools ("Manipulator Tools") into the tool platform, manipulating the LLM's tool selection and execution process. This allows for privacy theft (extracting user queries), denial-of-service (DoS) attacks against legitimate tools, and unscheduled tool-calling (forcing the use of attacker-specified tools regardless of relevance). The attack exploits vulnerabilities in the tool retrieval mechanism and the LLM's decision-making process. Successful attacks require the malicious tool to be (1) retrieved by the system, (2) selected for execution by the LLM, and (3) its output to manipulate subsequent LLM actions.
CVE-2024-XXXX
Large Language Models (LLMs) are vulnerable to a novel agentic-based red-teaming attack, PrivAgent, which uses reinforcement learning to generate adversarial prompts. These prompts can extract sensitive information, including system prompts and portions of training data, from target LLMs even with existing guardrail defenses. The attack leverages a custom reward function based on a normalized sliding-window word edit similarity metric to guide the learning process, enabling it to overcome the limitations of previous fuzzing and genetic approaches.
Large Language Model (LLM) agents are vulnerable to obfuscated adversarial prompts that exploit tool misuse. These prompts, crafted through prompt optimization techniques, force the agent to execute tools (e.g., URL fetching, markdown rendering) in a way that leaks sensitive user data (e.g., PII) without the user's knowledge. The prompts are designed to be visually indistinguishable from benign prompts.
Jailbreaking vulnerabilities in Large Language Models (LLMs) used in Retrieval-Augmented Generation (RAG) systems allow escalation of attacks from entity extraction to full document extraction and enable the propagation of self-replicating malicious prompts ("worms") within interconnected RAG applications. Exploitation leverages prompt injection to force the LLM to return retrieved documents or execute malicious actions specified within the prompt.
Large Language Models (LLMs) employing gradient-ascent based unlearning methods are vulnerable to a dynamic unlearning attack (DUA). DUA leverages optimized adversarial suffixes appended to prompts, reintroducing unlearned knowledge even without access to the unlearned model's parameters. This allows an attacker to recover sensitive information previously designated for removal.
A Cross-Prompt Injection Attack (XPIA) can be amplified by appending a Greedy Coordinate Gradient (GCG) suffix to the malicious injection. This increases the likelihood that a Large Language Model (LLM) will execute the injected instruction, even in the presence of a user's primary instruction, leading to data exfiltration. The success rate of the attack depends on the LLM's complexity; medium-complexity models show increased vulnerability.
Large Language Model (LLM)-based Code Completion Tools (LCCTs), such as GitHub Copilot and Amazon Q, are vulnerable to jailbreaking and training data extraction attacks due to their unique workflows and reliance on proprietary code datasets. Jailbreaking attacks exploit the LLM's ability to generate harmful content by embedding malicious prompts within various code components (filenames, comments, variable names, function calls). Training data extraction attacks leverage the LLM's tendency to memorize training data, allowing extraction of sensitive information like email addresses and physical addresses from the proprietary dataset.
© 2025 Promptfoo. All rights reserved.