Agent Tool Metadata Lure
Research Paper
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
View PaperDescription: A vulnerability exists in the tool selection mechanisms of Large Language Model (LLM) agents, identified as the "Attractive Metadata Attack" (AMA). This flaw allows an adversary to manipulate the metadata (names, descriptions, and parameter schemas) of malicious external tools to statistically maximize the likelihood of their selection by the agent, without requiring prompt injection or access to model internals. The vulnerability exploits the agent’s semantic scoring function used to map user queries to tools. By utilizing a black-box, state-action-value optimization framework based on in-context learning, an attacker can iteratively refine tool metadata to become "deceptively attractive" to the LLM. This results in the agent preferentially invoking malicious tools over benign alternatives during standard task execution, bypassing prompt-level sanitization, instruction filtering, and structured protocols like the Model Context Protocol (MCP).
Examples: The attack does not rely on specific malformed strings but rather on semantically optimized descriptions generated via an iterative process.
- Optimization Methodology: The attacker employs an LLM to generate batches of tool metadata. These are evaluated against a target query set $Q$ and normal tool set $NT$ to calculate an invocation probability $P(t, Q, NT)$. High-performing metadata is iteratively refined using a weighted value function $V(t) = p + \lambda(p - p_{parent})$.
- Semantic Triggers: Experiments indicate that optimized metadata containing high-weight phrases such as "comprehensive" or "insight" significantly increases selection probability across diverse domains (e.g., IT operations, portfolio management).
- Reproduction: Full code for the optimization pipeline and metadata generation is available in the author's repository.
- See repository: https://github.com/SEAIC-M/AMA
Impact:
- Privacy Leakage: Successful exploitation leads to the extraction of Personally Identifiable Information (PII) including names, addresses, and credit card numbers (Verified 92% Privacy Leakage rate on open-source models).
- Context Exfiltration: Malicious tools can access and exfiltrate agent-level context, including user queries and system prompt instructions (e.g., role descriptions).
- Task Manipulation: The agent is induced to execute unauthorized actions defined by the malicious tool while maintaining the appearance of a normal workflow.
- Defense Bypass: The attack renders standard defenses ineffective; Dynamic Prompt Rewriting and Prompt Refuge mechanisms fail to prevent the invocation of AMA-optimized tools.
Affected Systems:
- LLM Agents utilizing the ReAct (Reason+Act) paradigm.
- Systems interacting with open or third-party tool marketplaces (e.g., RapidAPI Hub integrations).
- Tested Vulnerable Models:
- Gemma-3 27B
- LLaMA-3.3-Instruct 70B
- Qwen-2.5-Instruct 32B
- GPT-4o-mini
- Qwen3-32B
Mitigation Steps:
- Execution-Level Defenses: Implement security mechanisms at the execution layer rather than relying solely on prompt-level sanitization or auditor-based detection, which have proven ineffective against AMA.
- Tool Verification: Enforce strict verification and reputation scoring for third-party tools in open marketplaces; mere metadata validation is insufficient.
- Restricted Toolsets: Limit agent access to a closed, curated set of trusted tools rather than open retrieval from public repositories where metadata can be manipulated.
© 2026 Promptfoo. All rights reserved.