LangChain Poisoning Jailbreak

Description: A vulnerability in Retrieval-Augmented Generation (RAG) systems utilizing LangChain allows for indirect jailbreaks of Large Language Models (LLMs). By poisoning the external knowledge base accessed by the LLM through LangChain, attackers can manipulate the LLM's responses, causing it to generate malicious or inappropriate content. The attack exploits the LLM's reliance on the external knowledge base and bypasses direct prompt-based jailbreak defenses.

Examples: See https://github.com/CAM-FSS/jailbreak-langchain. The repository contains examples of poisoned knowledge bases and trigger prompts used to successfully jailbreak multiple LLMs. Specific examples include using encoded keywords (Morse code, Base64) within disguised malicious content embedded in PDF files to avoid keyword filtering. The trigger prompt then directs the LLM to the malicious content within the knowledge base.

Impact: Successful exploitation leads to the generation of harmful content by the LLM, including but not limited to: inciting violence, promoting self-harm, spreading misinformation, and generating discriminatory statements. This undermines the safety mechanisms of the LLM and could have severe consequences depending on the context of its deployment.

Affected Systems: LLM applications that utilize LangChain for RAG and rely on external knowledge bases are vulnerable. Specific models mentioned in the research include ChatGLM2, ChatGLM3, Llama2, Qwen, Xinghuo 3.5, and Ernie-3.5 (and likely others using similar architectures).

Mitigation Steps:

Implement robust input sanitization and validation for all data sources accessed by the LLM, including external knowledge bases.
Employ more advanced content filtering techniques that go beyond simple keyword matching, such as semantic analysis and contextual understanding.
Continuously monitor and update the knowledge base for malicious content. Regularly audit and update the content filtering mechanisms.
Implement mechanisms to detect and block attempts to inject malicious data into the knowledge base. This can involve analyzing the content for suspicious patterns or characteristics.
Consider using multiple independent knowledge sources or diversify data sources to reduce the impact of poisoning a single source.

LangChain Poisoning Jailbreak

Research Paper