LMVD-ID: 0bd762b7
Published December 1, 2023

Adversarial Code Generation

Affected Models:code llama 7b, starchat-15b, wizardcoder-3b, wizardcoder 15b

Research Paper

Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions

View Paper

Description: Large Language Models (LLMs) used for code generation are vulnerable to adversarial natural language instructions that preserve semantic meaning but induce the generation of functionally correct code containing specific vulnerabilities. The attack leverages a novel algorithm, DeceptPrompt, to generate adversarial prompts that manipulate the LLM's output, resulting in vulnerable code without altering the intended functionality.

Examples: See DeceptPrompt paper for examples demonstrating successful attacks targeting various Common Weakness Enumerations (CWEs) including CWE-20 (Improper Input Validation), CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer), CWE-89 (Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')), CWE-476 (Null Pointer Dereference), and CWE-416 (Use After Free). Specific examples are shown in Figures 2, 4, 5, 6, 7, and 8 within the DeceptPrompt paper.

Impact: Successful exploitation can lead to the generation of vulnerable software applications with security flaws such as buffer overflows, SQL injection, improper input validation, null pointer dereferences, and use-after-free vulnerabilities. This allows attackers to compromise the system's security and potentially gain unauthorized access or control.

Affected Systems: LLM-driven code generation systems using models such as Code Llama, StarCoder, and WizardCoder, and potentially others.

Mitigation Steps:

  • Improved LLM training data which includes adversarial examples to enhance model robustness.
  • Implement robust input sanitization and validation within the generated code.
  • Develop and deploy static and dynamic code analysis tools to detect vulnerabilities.
  • Regular security audits of the LLM and its generated code.
  • Further research on techniques to make LLMs more resistant to adversarial attacks.

© 2025 Promptfoo. All rights reserved.