LMVD-ID: 8ee72e81
Published November 1, 2024

SQL Injection Jailbreak

Affected Models:vicuna-7b-v1.5, llama-2-7b-chat-hf, llama-3.1-8b-instruct, mistral-7b-instruct-v0.2, deepseek-llm-7b-chat

Research Paper

SQL Injection Jailbreak: a structural disaster of large language models

View Paper

Description: A novel SQL Injection Jailbreak (SIJ) vulnerability allows attackers to bypass safety mechanisms in Large Language Models (LLMs) by manipulating the structure of input prompts. The attack leverages the model's processing of system prompts, user prefixes, user prompts, and assistant prefixes to effectively "comment out" the expected response prefix and inject harmful instructions, causing the LLM to generate unsafe content. This vulnerability exploits the external properties of the LLM, specifically how it parses input prompts, rather than inherent model weaknesses.

Examples: See the paper's repository https://github.com/weiyezhimeng/SQLInjection-Jailbreak for detailed examples and attack prompts. The paper demonstrates successful attacks across multiple open-source LLMs.

Impact: Successful exploitation of this vulnerability allows attackers to elicit harmful and unsafe content from LLMs, potentially leading to the generation of harmful instructions, biased outputs, personal information disclosure, or other malicious activities. The attack achieves near 100% success rates in the described experiments.

Affected Systems: Open-source LLMs including Vicuna-7b-v1.5, Llama-2-7b-chat-hf, Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.2, and DeepSeek-LLM-7B-Chat. The vulnerability potentially affects other LLMs with similar prompt parsing mechanisms.

Mitigation Steps:

  • Implement input sanitization and validation techniques to prevent malicious prompt structures.
  • Develop robust parsing mechanisms that are less susceptible to manipulation through structural changes in the input.
  • Employ defense methods that incorporate random strings or keys after ethical prompts, making it harder for attackers to reliably predict response prefixes. (as suggested by the Self-Reminder-Key in the paper)
  • Regularly update and patch LLMs with security fixes addressing newly discovered vulnerabilities.
  • Conduct thorough security testing and penetration testing of LLMs to identify and mitigate potential vulnerabilities.

© 2025 Promptfoo. All rights reserved.