Attacks that disrupt model availability
Large Language Models (LLMs) equipped with native code interpreters are vulnerable to Denial of Service (DoS) via resource exhaustion. An attacker can craft a single prompt that causes the interpreter to execute code that depletes CPU, memory, or disk resources. The vulnerability is particularly pronounced when a resource-intensive task is framed within a plausibly benign or socially-engineered context ("indirect prompts"), which significantly lowers the model's likelihood of refusal compared to explicitly malicious requests.
A resource consumption vulnerability exists in multiple Large Vision-Language Models (LVLMs). An attacker can craft a subtle, imperceptible adversarial perturbation and apply it to an input image. When this image is processed by an LVLM, even with a benign text prompt, it forces the model into an unbounded generation loop. The attack, named RECALLED, uses a gradient-based optimization process to create a visual perturbation that steers the model's text generation towards a predefined, repetitive sequence (an "Output Recall" target). This causes the model to generate text that repeats a word or sentence until the maximum context limit is reached, leading to a denial-of-service condition through excessive computational resource usage and response latency.
Large Language Model (LLM) tool-calling systems are vulnerable to adversarial tool injection attacks. Attackers can inject malicious tools ("Manipulator Tools") into the tool platform, manipulating the LLM's tool selection and execution process. This allows for privacy theft (extracting user queries), denial-of-service (DoS) attacks against legitimate tools, and unscheduled tool-calling (forcing the use of attacker-specified tools regardless of relevance). The attack exploits vulnerabilities in the tool retrieval mechanism and the LLM's decision-making process. Successful attacks require the malicious tool to be (1) retrieved by the system, (2) selected for execution by the LLM, and (3) its output to manipulate subsequent LLM actions.
Large Language Models (LLMs) are vulnerable to a novel jailbreak attack that exploits resource limitations. By overloading the model with a computationally intensive preliminary task (e.g., a complex character map lookup and decoding), the attacker prevents the activation of the LLM's safety mechanisms, enabling the generation of unsafe outputs from subsequent prompts. The attack's strength is scalable and adjustable by modifying the complexity of the preliminary task.
A denial-of-service (DoS) vulnerability exists in certain Large Language Model (LLM) safeguard implementations due to susceptibility to adversarial prompts. Attackers can inject short, seemingly innocuous adversarial prompts into user prompt templates, causing the safeguard to incorrectly classify legitimate user requests as unsafe and reject them. This allows for a DoS attack against specific users without requiring modification of the LLM itself.
© 2025 Promptfoo. All rights reserved.