Phantom Token User Deception

Description: Large Language Models (LLMs) that utilize byte-stream parsing or structural extraction to process PDF files—specifically the OpenAI GPT and Anthropic Claude families—are vulnerable to adversarial text injection via imperceptible "phantom tokens." This vulnerability exploits the disconnect between how PDF viewers render documents for humans (visual layer) and how LLMs extract text from the PDF operator stream (data layer). Attackers can manipulate standard PDF text-showing operators (TJ and Tj) to interleave adversarial content with legitimate text. By assigning these injected tokens attributes that render them invisible (e.g., font size 0), the text remains hidden from human users but is fully processed by the LLM. This allows for the injection of hallucinations, malicious instructions, or context distortions that alter the model's output while preserving the visual integrity of the source document.

Examples: The attack is executed by modifying the raw PDF stream. An attacker takes a standard text string, splits it into segments, and inserts adversarial tokens with a font size of 0 between the segments.

1. PDF Operator Stream Injection In a standard PDF, a text string might appear in the stream as:

(The conference attracted 500 participants) Tj

In a TrapDoc-perturbed PDF, the stream is altered to inject the "phantom" phrase "drew nearly 800 attendees" (rendered invisibly) while breaking up the visible text:

/F1 12 Tf   % Set font size to 12 (visible)
[(The ) 20 (con) 20 (fer) 20 (ence )] TJ
/F1 0 Tf    % Set font size to 0 (invisible to human, read by LLM)
(drew nearly 800 attendees ) Tj
/F1 12 Tf   % Restore visible font size
[(attracted ) 20 (500 ) 20 (parti) 20 (cipants)] TJ

Note: The TJ operator allows for kerning adjustments (numbers), which TrapDoc exploits to interleave content.

2. Resulting Interpretation

Human View: "The conference attracted 500 participants"
LLM View (Token Stream): "The drew nearly 800 attendees conference attracted 500 participants"

See the implementation and dataset at: https://github.com/jindong22/TrapDoc

Impact:

Integrity Compromise: The LLM generates responses based on false premises, leading to incorrect summarization, erroneous code generation, or factually inaccurate reasoning.
Prompt Injection: Attackers can override system instructions or introduce bias without visual detection.
Academic/Professional Dishonesty: Users blindly relying on LLMs for peer reviews or homework will unwittingly submit hallucinated or manipulated content, as the LLM output will align with the invisible phantom tokens rather than the visible text.

Affected Systems:

OpenAI: GPT-4 family (including GPT-4.1, GPT-4o, o4-mini) via file upload/parsing interfaces.
Anthropic: Claude family via file upload/parsing interfaces.
Note: Systems relying on OCR/Vision-based parsing (e.g., DeepSeek, Gemini, Grok) are naturally immune as they process the rendered image rather than the byte stream.

Mitigation Steps:

Switch to Vision-Based Parsing: Implement OCR (Optical Character Recognition) or vision-based document ingestion instead of direct byte-stream or PDF operator parsing. This ensures the model processes only what is visually rendered to the user.
Sanitize PDF Streams: Pre-process PDF files to detect and remove text objects with a font size of 0 or text rendered with invisible attributes (e.g., transparent color, off-canvas coordinates) before passing the stream to the LLM.
Adversarial Training: Incorporate training data that includes PDFs with mismatched visual and structural layers to improve model robustness against hidden text injections.

Phantom Token User Deception

Research Paper