LMVD-ID: 428a631b
Published December 1, 2024

Adversarial Tool Injection Attacks

Affected Models:gpt-4, llama 3, qwen2

Research Paper

From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection

View Paper

Description: Large Language Model (LLM) tool-calling systems are vulnerable to adversarial tool injection attacks. Attackers can inject malicious tools ("Manipulator Tools") into the tool platform, manipulating the LLM's tool selection and execution process. This allows for privacy theft (extracting user queries), denial-of-service (DoS) attacks against legitimate tools, and unscheduled tool-calling (forcing the use of attacker-specified tools regardless of relevance). The attack exploits vulnerabilities in the tool retrieval mechanism and the LLM's decision-making process. Successful attacks require the malicious tool to be (1) retrieved by the system, (2) selected for execution by the LLM, and (3) its output to manipulate subsequent LLM actions.

Examples: See arXiv:2405.18540 for details on the ToolCommander framework, including examples of Manipulator Tools designed for privacy theft, denial-of-service, and unscheduled tool calling attacks. The paper provides examples of malicious tool JSON structures and corresponding adversarial suffixes used to optimize tool retrieval. Specific examples of malicious responses crafted to manipulate LLM tool scheduling are also included.

Impact: Successful attacks can lead to significant data breaches (privacy theft), service disruptions (DoS), and manipulation of LLM behavior for malicious purposes (unscheduled tool calling). This impacts the reliability and security of LLM tool-calling applications.

Affected Systems: LLM-based systems utilizing external tool-calling functionalities, particularly those employing flexible tool platforms and dynamically selecting tools based on user queries. Specific affected systems are not listed, as the vulnerability impacts the architecture itself rather than particular implementations. The paper evaluated this vulnerability with GPT-4o mini, Llama3-8b-instruct, and Qwen2-7B-Instruct, using ToolBench and Contriever.

Mitigation Steps:

  • Implement rigorous validation and verification processes for all tools added to the LLM tool platform before deployment.
  • Develop and integrate robust detection mechanisms that identify and flag suspicious tool descriptions or behaviors.
  • Consider using more secure retrieval mechanisms that are less susceptible to adversarial manipulation.
  • Design LLMs with enhanced resilience to adversarial tool outputs by incorporating mechanisms to detect and mitigate manipulation attempts. Improve the LLM's ability to differentiate between legitimate and malicious tool invocations.
  • Adopt a multi-layered system of monitoring and alerting that detects anomalous tool usage patterns. This includes monitoring tool retrieval frequency, execution times, and output characteristics.

© 2025 Promptfoo. All rights reserved.