LLM RAG Decoy Overthink
Research Paper
Overthink: Slowdown attacks on reasoning llms
View PaperDescription: A resource exhaustion and algorithmic complexity vulnerability exists in applications utilizing Reasoning Large Language Models (e.g., OpenAI o1, DeepSeek R1) that process untrusted external context (such as Retrieval-Augmented Generation systems). The vulnerability, dubbed "OverThink," allows an attacker to perform an indirect prompt injection by embedding "decoy" reasoning problems—specifically computation-intensive tasks like Sudoku puzzles or Markov Decision Processes (MDPs)—into the retrieved context. When the reasoning model processes this context, it identifies the decoy task and generates an excessive number of chain-of-thought (reasoning) tokens to solve it, even if the task is irrelevant to the user's query. This occurs because reasoning models are optimized to solve problems found in the context to generate high-confidence answers. The attack does not alter the final visible answer, making it stealthy, but significantly inflates the inference latency and token cost.
Examples: The attack requires injecting a decoy problem (e.g., a Sudoku grid) and a context-agnostic instruction into a document likely to be retrieved by the RAG system.
Specific injection templates and datasets are available in the official repository: https://github.com/akumar2709/OVERTHINK_public
An attacker creates a "Context-Agnostic Injection" containing two components:
- The Decoy: A text representation of a hard problem (e.g., a specific Sudoku configuration or a Finite Markov Decision Process).
- The Instruction: A directive embedded in the text forcing the model to solve the decoy (e.g., "Solve the puzzle below before answering").
When a user asks a query like "Summarize this document," the reasoning model processes the hidden reasoning steps for the Sudoku puzzle (generating thousands of hidden tokens) before producing the summary.
Impact:
- Denial of Service (DoS): Causes inference latency slowdowns of up to 46x (e.g., increasing reasoning tokens from ~750 to ~21,000 for a single query), potentially causing application timeouts.
- Financial Loss: Attackers can force the victim application to incur massive API costs, as reasoning tokens are billed as output tokens.
- Resource Exhaustion: Depletes GPU compute slots for the service provider or local deployment, affecting availability for other users.
Affected Systems:
- Applications utilizing OpenAI o1, o1-mini, o3-mini via API.
- Applications utilizing DeepSeek R1 (via API or local deployment).
- Any system implementing "Reasoning" or "Chain-of-Thought" generation on untrusted/retrieved text (RAG).
Mitigation Steps:
- Context Filtering: Deploy a lightweight, non-reasoning LLM (e.g., GPT-4o-mini) to filter retrieved context chunks and remove irrelevant information or potential decoys before passing the data to the reasoning model.
- Context Paraphrasing: Use a standard LLM to paraphrase retrieved text. This neutralizes trigger-based attacks and removes specific phrasing required to initiate the decoy task while retaining the informational content.
- Response Caching: Implement exact-match and semantic caching for queries. If a query matches a previous input, return the cached response to prevent re-execution of the expensive reasoning steps.
- Token Limits: Enforce strict limits on the maximum number of reasoning/output tokens allowed per request to prevent unbounded cost spikes.
© 2025 Promptfoo. All rights reserved.