LCCT Data Extraction & Jailbreak

Description: Large Language Model (LLM)-based Code Completion Tools (LCCTs), such as GitHub Copilot and Amazon Q, are vulnerable to jailbreaking and training data extraction attacks due to their unique workflows and reliance on proprietary code datasets. Jailbreaking attacks exploit the LLM's ability to generate harmful content by embedding malicious prompts within various code components (filenames, comments, variable names, function calls). Training data extraction attacks leverage the LLM's tendency to memorize training data, allowing extraction of sensitive information like email addresses and physical addresses from the proprietary dataset.

Examples: See https://github.com/Sensente/Security-Attacks-on-LCCTs for attack code examples. These examples demonstrate techniques such as "Filename Proxy Attacks," "Cross-File Attacks," "Guided Trigger Attacks," and "Code Embedded Attacks" to elicit harmful outputs and extract sensitive user data from the training dataset.

Impact: Successful jailbreaking attacks can lead to the generation of malicious code, enabling various attacks such as data breaches, malware deployment, and denial-of-service. Successful training data extraction attacks compromise user privacy by revealing sensitive personal information from the LCCT's training data. The success rate of jailbreaking attacks varied between 46.3% and 99.4% depending on the LCCT and attack vector. The successful training data extraction attack extracted 54 email addresses and 314 physical addresses.

Affected Systems: LLM-based Code Completion Tools (LCCTs) using proprietary code datasets for training, including but not limited to GitHub Copilot and Amazon Q. The vulnerability also applies to general-purpose LLMs with code completion capabilities, although the success rate may vary.

Mitigation Steps:

Implement more robust input sanitization and validation techniques at the input preprocessing stage to detect and filter malicious prompts before they reach the LLM. This could include more sophisticated analyses beyond simple keyword filtering.
Employ stronger security checks during output post-processing, aiming to identify and filter harmful content generated by the LLM. Consider advanced techniques and increase the time allocated for these checks, accepting a possible trade-off with reduced response time.
Investigate and implement differential privacy or other privacy-preserving techniques during the training phase of the LLM to reduce the risk of memorizing and leaking sensitive training data.
Regularly audit and update the LLM’s safety training data to adapt to evolving attack strategies and ensure consistently safe operation.
Conduct thorough security testing of the LCCT regularly, incorporating adversarial testing methodologies and incorporating multiple different types of attacks simulating real-world scenarios.

LCCT Data Extraction & Jailbreak

Research Paper