LLM Version Fingerprinting
Research Paper
Llmmap: Fingerprinting for large language models
View PaperDescription: Large Language Models (LLMs) integrated into applications reveal unique behavioral fingerprints through responses to crafted queries. LLMmap exploits this by sending carefully constructed prompts and analyzing the responses to identify the specific LLM version with high accuracy (over 95% in testing against 42 LLMs). This allows attackers to tailor attacks exploiting known vulnerabilities specific to the identified LLM version.
Examples: See https://github.com/pasquini-dario/LLMmap. The paper details specific query sets and their effectiveness against various LLM instances. Examples include "malformed" queries, banner-grabbing prompts ("What model are you?"), and prompts designed to elicit responses based on the LLM's safety alignment mechanisms.
Impact: Successful exploitation allows attackers to:
- Identify the specific LLM version used in an application.
- Craft targeted adversarial inputs, taking advantage of known vulnerabilities specific to the LLM version.
- Potentially manipulate AI-driven services.
Affected Systems: Applications integrating any of the 42 LLMs tested in the LLMmap research, and potentially others exhibiting similar vulnerabilities. The paper specifically mentions ChatGPT and Claude instances but the vulnerability is more general.
Mitigation Steps:
-
Defense in Depth: Multiple layers of defense, including input sanitization, output validation, and rate limiting, will be effective to mitigate this vulnerability.
-
Query Monitoring: Implement a system to monitor and analyze incoming queries for patterns indicative of fingerprinting attempts. Block or modify responses to suspicious queries. Note that simple query blacklisting is insufficient.
-
Response Randomization: Introduce controlled randomness into LLM responses, without affecting main functionality, to reduce the consistency of fingerprints. However, this method may not be completely successful.
-
Model Diversity and Obfuscation: The strategy of employing multiple, different LLMs for the same task makes it more difficult for attackers to identify a specific version reliably.
-
Regular Updates and Patching: Regularly update LLMs to address newly discovered vulnerabilities.
-
Continuous Monitoring and Research: Stay informed about ongoing research into LLM vulnerabilities and adapt defenses accordingly. The attack landscape is constantly evolving, requiring proactive and responsive security measures.
© 2025 Promptfoo. All rights reserved.