Attacks that disrupt model availability
Large Language Model (LLM) tool-calling systems are vulnerable to adversarial tool injection attacks. Attackers can inject malicious tools ("Manipulator Tools") into the tool platform, manipulating the LLM's tool selection and execution process. This allows for privacy theft (extracting user queries), denial-of-service (DoS) attacks against legitimate tools, and unscheduled tool-calling (forcing the use of attacker-specified tools regardless of relevance). The attack exploits vulnerabilities in the tool retrieval mechanism and the LLM's decision-making process. Successful attacks require the malicious tool to be (1) retrieved by the system, (2) selected for execution by the LLM, and (3) its output to manipulate subsequent LLM actions.
Large Language Models (LLMs) are vulnerable to a novel jailbreak attack that exploits resource limitations. By overloading the model with a computationally intensive preliminary task (e.g., a complex character map lookup and decoding), the attacker prevents the activation of the LLM's safety mechanisms, enabling the generation of unsafe outputs from subsequent prompts. The attack's strength is scalable and adjustable by modifying the complexity of the preliminary task.
A denial-of-service (DoS) vulnerability exists in certain Large Language Model (LLM) safeguard implementations due to susceptibility to adversarial prompts. Attackers can inject short, seemingly innocuous adversarial prompts into user prompt templates, causing the safeguard to incorrectly classify legitimate user requests as unsafe and reject them. This allows for a DoS attack against specific users without requiring modification of the LLM itself.
© 2025 Promptfoo. All rights reserved.