LMVD-ID: 260435d8
Published February 1, 2025

Agent-in-the-Middle Attack

Affected Models:gpt-4o, gpt-3.5-turbo

Research Paper

Red-Teaming LLM Multi-Agent Systems via Communication Attacks

View Paper

Description: A vulnerability exists in the communication mechanisms of Large Language Model (LLM)-based Multi-Agent Systems (LLM-MAS) enabling an Agent-in-the-Middle (AiTM) attack. An attacker can intercept and manipulate messages between agents, causing the victim agent to produce malicious outputs. The attack does not require compromising individual agents directly; instead, it leverages contextual manipulation of inter-agent communications.

Examples: See arXiv:2405.18540 for detailed examples demonstrating AiTM attacks across various Multi-Agent system frameworks and communication structures. The paper shows how an adversarial agent can manipulate messages to achieve different goals, including Denial-of-Service (DoS) attacks and the induction of specific malicious behaviors. For example, in a debate scenario, the attacker can subtly steer the conversation by modifying messages received by a specific agent, leading to a predetermined outcome.

Impact: Successful AiTM attacks can compromise the integrity and availability of LLM-MAS. Malicious outputs can propagate through the system, impacting the overall results and potentially leading to serious consequences depending on the application (e.g., injection of malicious code in a software development LLM-MAS). DoS attacks can render the entire system unusable.

Affected Systems: LLM-based Multi-Agent Systems using message-passing communication mechanisms. This includes systems employing various communication structures such as chain, tree, complete, or random graphs. Specific examples include, but are not limited to, frameworks like AutoGen and Camel. In general, any system where agents exchange information via potentially insecure channels is vulnerable.

Mitigation Steps:

  • Secure Communication Channels: Implement secure communication channels (e.g., encryption) to prevent interception and manipulation of inter-agent messages.
  • Message Integrity Verification: Incorporate mechanisms (e.g., digital signatures, hash verification) to detect message tampering.
  • Input Sanitization and Validation: Implement robust input validation at the agent level to prevent the propagation of manipulated information.
  • Redundancy and Fail-safes: Design systems with redundancy and fail-safe mechanisms to mitigate the impact of compromised agents or manipulated messages.
  • Agent Behavior Monitoring: Implement monitoring systems to detect anomalous agent behavior or deviations from expected communication patterns.
  • Principle of Least Privilege: Agents should only have access to the minimum information and resources they require to perform their tasks.

© 2025 Promptfoo. All rights reserved.