LMVD-ID: 6e8f115b
Published May 1, 2025

LLM Multi-Agent IP Leakage

Affected Models:gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

Research Paper

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

View Paper

Description: Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

Examples: See the paper for detailed examples of adversarial queries and the resulting leaked information. The paper includes examples using both synthetic and real-world MAS applications (Coze and CrewAI).

Impact: Successful attacks allow complete replication of the MAS application, leading to significant financial losses for developers and potential misuse of sensitive information processed by the MAS.

Affected Systems: LLM-based Multi-Agent Systems (MAS) using any LLM (including but not limited to GPT-4, LLaMA, Qwen) and implemented utilizing popular frameworks such as LangChain, LlamaIndex, AutoAgents, or custom implementations with similar communication protocols.

Mitigation Steps:

  • Input Sanitization: Implement robust input sanitization and validation mechanisms to prevent adversarial queries from propagating successfully. This should go beyond simple delimiter checks and should consider the semantic content and potential impact of the input within the multi-agent system's context.
  • Agent Output Filtering: Carefully filter the output from each agent, removing any sensitive information before it is passed to subsequent agents. This requires careful design to avoid disrupting legitimate functionality.
  • Differential Privacy: Explore the application of differential privacy techniques to mask sensitive information contained in the system prompts, task instructions, or tool outputs.
  • Adversarial Training: Train the LLMs within the MAS agents using adversarial examples to enhance their resilience against malicious inputs.
  • Monitoring and Anomaly Detection: Implement system-wide monitoring to detect unusual query patterns or unexpected data leakage. Develop anomaly detection methods focused on multi-agent interactions.
  • Secure Architecture Design: Carefully design the MAS architecture to limit information flow between agents, minimizing the damage propagation should an agent be compromised. Consider using more robust topologies that are less vulnerable to this type of information propagation attack.

© 2025 Promptfoo. All rights reserved.