Related papers: Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

URL: http://arxiv.org/abs/2505.17225v1
Date: Thu, 22 May 2025 19:00:01 GMT
Title: Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Authors: Doohyuk Jang, Yoonjeon Kim, Chanjae Park, Hyun Ryu, Eunho Yang,
Abstract summary: Large language models frequently exhibit a problematic reliance on familiar reasoning patterns.<n>Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories.<n>This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle.
Score: 27.437685534830457
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in language models.

Related papers

Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis [3.711555701154055]
Reasoning models and their integration into practical AI chat bots have led to breakthroughs in solving advanced math, deep search, and extractive question answering problems.<n>Yet, a complete understanding of why these models hallucinate more than general purpose language models is missing.<n>In this study, we systematicallyexplore reasoning failures of contemporary language models on multi-hop question answering tasks.
arXiv Detail & Related papers (2025-08-06T17:58:36Z)
Inverse Scaling in Test-Time Compute [51.16323216811257]
Extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance.<n>We identify five distinct failure modes when models reason for longer.<n>These findings suggest that while test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns.
arXiv Detail & Related papers (2025-07-19T00:06:13Z)
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models [11.250861762443801]
We introduce the RFMDataset (Reveal Failure Modes), a collection of 200 diverse mathematical proof problems.<n>We thoroughly evaluate advanced models' performance on it.<n>Our analysis uncovers 10 fine-grained error types, which shows fundamental limitations in current large reasoning models.
arXiv Detail & Related papers (2025-06-20T16:14:18Z)
Preference Learning for AI Alignment: a Causal Perspective [55.2480439325792]
We frame this problem in a causal paradigm, providing the rich toolbox of causality to identify persistent challenges.<n>Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation.<n>We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness.
arXiv Detail & Related papers (2025-06-06T10:45:42Z)
Evaluating the Logical Reasoning Abilities of Large Reasoning Models [15.009205651973666]
We introduce LogiEval, a benchmark for evaluating logical reasoning in large reasoning models.<n>LogiEval spans diverse reasoning types (deductive, inductive, analogical, and abductive) and task formats (e.g., logical sequence, argument analysis)<n>Our experiments demonstrate that modern reasoning models excel at 4-choice argument analysis problems and analogical reasoning, surpassing human performance.<n>Our analysis reveals that human performance does not mirror model failure distributions.
arXiv Detail & Related papers (2025-05-17T05:36:14Z)
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [65.39456695678713]
We introduce approximate measures of problem-level difficulty and demonstrate that a clear relationship between problem difficulty and optimal token spend exists.<n>We find that in general, reasoning models are poorly calibrated, particularly on easy problems.<n>We introduce THOUGHTTERMINATOR, a training-free black box decoding technique that significantly improves reasoning model calibration.
arXiv Detail & Related papers (2025-04-17T22:16:30Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection [0.0]
We study the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, and the implausible utility of explanations. Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees. We find that feature-based model explanations are most often inconsistent across different settings.
arXiv Detail & Related papers (2024-07-04T15:35:42Z)
Exposing Attention Glitches with Flip-Flop Language Modeling [55.0688535574859]
This work identifies and analyzes the phenomenon of attention glitches in large language models. We introduce flip-flop language modeling (FFLM), a family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques.
arXiv Detail & Related papers (2023-06-01T17:44:35Z)
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.