LMVD-ID: f9eee064
Published June 1, 2025

Chain-of-Code Collapse

Affected Models:Claude 3, Claude 3.7, Llama 3.1 8B, Gemini 2, DeepSeek-R1 7B, Qwen 2.5

Research Paper

Break-The-Chain: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation

View Paper

Description: Large Language Models (LLMs) utilized for code generation exhibit a vulnerability termed "Chain-of-Code Collapse" (CoCC), where the models fail to generate correct code when presented with semantically faithful but adversarially structured prompts. By applying transformations such as domain shifting (renaming variables/contexts), adding distracting constraints (irrelevant but plausible rules), or inverting objectives (negation), an attacker can cause the model to produce functionally incorrect code, omit required logic, or revert to memorized solution templates that contradict the prompt. This vulnerability stems from the model's reliance on surface-level statistical patterns rather than robust logical reasoning, allowing benign linguistic changes to degrade performance by up to 68% in models like Claude-3.7-Sonnet and Gemini-2.5-Flash.

Examples: The following examples demonstrate how semantic perturbations cause reasoning failures in state-of-the-art models.

  • Example 1: Archetype Override via Negation (The "Min/Max" Inversion)

  • Original Task: "Find the minimum number of extra characters remaining after optimally breaking a string s using words from a dictionary."

  • Adversarial Prompt: "Find the maximum number of extra characters." (The logic requires the model to invert the optimization goal).

  • Observed Failure: Gemini-2.5-Flash ignores the specific instruction to "maximize" and generates code solving the original "minimize" problem (returning dp[n] derived from minimization logic). The model overrides the explicit user prompt with the memorized "archetype" of the standard LeetCode problem.

  • Example 2: Logic Collapse via Domain Shift

  • Original Task: minIncrementOperations. Given nums and k, find minimum operations so that for every subarray of length 3, at least one element is >= k.

  • Adversarial Prompt: "You are managing project tasks. task_priorities is an array... A project schedule is 'stable' if, for every sequence of 3 consecutive tasks, at least one task has a priority score >= min_threshold."

  • Observed Failure: Claude-3.7-Sonnet correctly restates the new domain textually but generates code that simplifies the logic incorrectly. Instead of implementing the sliding window check (any of 3 $\ge$ threshold), it implements a loop checking if every single task is $\ge$ threshold (if task_priorities[i] < min_threshold), fundamentally altering the algorithm.

  • Example 3: Distracting Constraints

  • Adversarial Prompt: Injecting a clause like "Ensure the input array length is a palindrome" into a standard sorting or search problem.

  • Observed Failure: Models attempting to process this logically inert constraint suffer significant increases in token-level entropy, leading to syntax errors or logic hallucinations in the resulting code.

Impact:

  • Functional Incorrectness: Generated code fails to meet user specifications, leading to software bugs.
  • Security Vulnerabilities: In "Negation" scenarios, a developer asking for a specific security constraint (e.g., "exclude users with X") may receive code that does the opposite (e.g., "include users with X") because the model reverts to a more common, positive-pattern training example.
  • Instruction Bypass: The vulnerability demonstrates that task framing can override instruction tuning; models may ignore explicit constraints if the problem resembles a memorized training sample.

Affected Systems:

  • Google Gemini-2.5-Flash / Gemini-2.0-Flash
  • Anthropic Claude-3.7-Sonnet / Claude-3-Haiku
  • DeepSeek-R1-Distill (Qwen-7B/14B) and DeepSeek-Coder-33B
  • Meta LLaMA-3.1-8B-Instruct
  • Alibaba Qwen2.5-Coder

Mitigation Steps:

  • Semantic Perturbation Robustness Testing: Evaluate models using the CoCC framework (Storytelling, Gamification, Domain Shift, Negation) to measure the standard deviation in performance across logically equivalent rewrites.
  • Uncertainty-Aware Training: Tune models not only on correct answers but on uncertainty-aware reasoning paths to prevent attention diffusion when processing distracting constraints.
  • Enforced Comment Generation: Instruct or fine-tune models to generate natural language inline comments during code synthesis. Research indicates that the absence of comments (e.g., in Claude-3.7 under Gamification) correlates directly with brittle logic and reasoning collapse.
  • Contextual Scaffolding: Use structured context in prompts to make implicit assumptions explicit, as "distracting" constraints can sometimes stabilize reasoning if they clarify the domain.

© 2026 Promptfoo. All rights reserved.