LMVD-ID: 72bdab70
Published August 1, 2024

GCG Suffix Data Exfiltration

Affected Models:phi-3-mini, gpt-3.5, gpt-4o, llama2

Research Paper

WHITE PAPER: A Brief Exploration of Data Exfiltration using GCG Suffixes

View Paper

Description: A Cross-Prompt Injection Attack (XPIA) can be amplified by appending a Greedy Coordinate Gradient (GCG) suffix to the malicious injection. This increases the likelihood that a Large Language Model (LLM) will execute the injected instruction, even in the presence of a user's primary instruction, leading to data exfiltration. The success rate of the attack depends on the LLM's complexity; medium-complexity models show increased vulnerability.

Examples: See the white paper for experimental setup and results. The attack involves crafting a malicious injection targeting a specific function (e.g., a network request) and embedding it within third-party data presented to the LLM alongside a user prompt. Appending a GCG suffix to the injection significantly increases the probability of the LLM executing the injection. Specific examples from the dataset are not publicly available.

Impact: Successful exploitation leads to data exfiltration from the user's context, potentially exposing sensitive information such as credentials or Personal Identifiable Information (PII). The financial impact of a successful attack can be significant.

Affected Systems: LLMs vulnerable to XPIA and susceptible to manipulation by GCG suffixes. Specifically, the paper tested Phi-3-mini, GPT-3.5, and GPT-4, showing varying degrees of vulnerability. Other LLMs with similar architecture or training may also be affected.

Mitigation Steps:

  • Improved Prompt Filtering: Implement more robust prompt filtering techniques to detect malicious injections, particularly those incorporating GCG suffixes.
  • Model Complexity: Consider using more complex LLMs, as they exhibit higher resistance to this attack vector.
  • GCG Suffix Detection: Develop methods to specifically identify and neutralize GCG suffixes.
  • Function Call Sanitization: Sanitize and validate function calls generated by the LLM before execution, to prevent the misuse of tools that provide external access (e.g., network requests).
  • Defense Variation: Implement different defense strategies for LLMs of varying complexities, recognizing that the effectiveness of certain defenses (such as prompt filtering) may vary drastically based on model complexity.

© 2025 Promptfoo. All rights reserved.