GCG Suffix Data Exfiltration

Description: A Cross-Prompt Injection Attack (XPIA) can be amplified by appending a Greedy Coordinate Gradient (GCG) suffix to the malicious injection. This increases the likelihood that a Large Language Model (LLM) will execute the injected instruction, even in the presence of a user's primary instruction, leading to data exfiltration. The success rate of the attack depends on the LLM's complexity; medium-complexity models show increased vulnerability.

Examples: See the white paper for experimental setup and results. The attack involves crafting a malicious injection targeting a specific function (e.g., a network request) and embedding it within third-party data presented to the LLM alongside a user prompt. Appending a GCG suffix to the injection significantly increases the probability of the LLM executing the injection. Specific examples from the dataset are not publicly available.

Impact: Successful exploitation leads to data exfiltration from the user's context, potentially exposing sensitive information such as credentials or Personal Identifiable Information (PII). The financial impact of a successful attack can be significant.

Affected Systems: LLMs vulnerable to XPIA and susceptible to manipulation by GCG suffixes. Specifically, the paper tested Phi-3-mini, GPT-3.5, and GPT-4, showing varying degrees of vulnerability. Other LLMs with similar architecture or training may also be affected.

Mitigation Steps:

Improved Prompt Filtering: Implement more robust prompt filtering techniques to detect malicious injections, particularly those incorporating GCG suffixes.
Model Complexity: Consider using more complex LLMs, as they exhibit higher resistance to this attack vector.
GCG Suffix Detection: Develop methods to specifically identify and neutralize GCG suffixes.
Function Call Sanitization: Sanitize and validate function calls generated by the LLM before execution, to prevent the misuse of tools that provide external access (e.g., network requests).
Defense Variation: Implement different defense strategies for LLMs of varying complexities, recognizing that the effectiveness of certain defenses (such as prompt filtering) may vary drastically based on model complexity.

GCG Suffix Data Exfiltration

Research Paper