Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
- URL: http://arxiv.org/abs/2512.19995v1
- Date: Tue, 23 Dec 2025 02:44:25 GMT
- Title: Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
- Authors: Ming Li, Chenrui Fan, Yize Cheng, Soheil Feizi, Tianyi Zhou,
- Abstract summary: We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
- Score: 56.656180566692946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
Related papers
- Fluid Representations in Reasoning Models [91.77876704697779]
We present a mechanistic analysis of how QwQ-32B processes abstract structural information.<n>We find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning.
arXiv Detail & Related papers (2026-02-04T18:34:50Z) - Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models [50.39102836928242]
We introduce a representational perspective to investigate the dynamics of the model's internal states.<n>We discover that post-training yields only limited improvement in static initial representation quality.
arXiv Detail & Related papers (2026-01-31T15:23:33Z) - Modeling Hierarchical Thinking in Large Reasoning Models [2.429493364781869]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities when they generate step-by-step solutions.<n>When trained to using chain-of-thought reasoning examples, the resulting models appear to learn hierarchical thinking strategies similar to those used by humans.<n>In this paper, we adopt a memoryless Finite State Machine formulation to approximate LRM's emerging hierarchical reasoning dynamics as a structured, interpretable abstraction.
arXiv Detail & Related papers (2025-10-25T21:25:30Z) - REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model [29.40036398095681]
We define the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations.<n>We build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples.<n>Our experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations.
arXiv Detail & Related papers (2025-09-26T16:02:27Z) - Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics [69.00587226225232]
We introduce a state-aware transition framework that abstracts CoT trajectories into structured latent dynamics.<n>To characterize the global structure of reasoning, we model their progression as a Markov chain.<n>This abstraction supports a range of analyses, including semantic role identification, temporal pattern visualization, and consistency evaluation.
arXiv Detail & Related papers (2025-08-29T18:53:31Z) - Evaluating the Logical Reasoning Abilities of Large Reasoning Models [15.009205651973666]
We introduce LogiEval, a benchmark for evaluating logical reasoning in large reasoning models.<n>LogiEval spans diverse reasoning types (deductive, inductive, analogical, and abductive) and task formats (e.g., logical sequence, argument analysis)<n>Our experiments demonstrate that modern reasoning models excel at 4-choice argument analysis problems and analogical reasoning, surpassing human performance.<n>Our analysis reveals that human performance does not mirror model failure distributions.
arXiv Detail & Related papers (2025-05-17T05:36:14Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism [68.05754701230039]
We construct a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models.<n>We propose a random matrix-based algorithm to enhance the model's reasoning ability.
arXiv Detail & Related papers (2024-05-24T07:41:26Z) - Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus [4.569421189811511]
We introduce a novel approach to evaluate the inference and contextual understanding abilities of Large Language Models (LLMs)
We focus on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity.
Our experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects.
arXiv Detail & Related papers (2024-03-18T13:50:50Z) - Learning a Structural Causal Model for Intuition Reasoning in
Conversation [20.243323155177766]
Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models.
We develop a conversation cognitive model ( CCM) that explains how each utterance receives and activates channels of information.
By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds.
arXiv Detail & Related papers (2023-05-28T13:54:09Z) - Case-Based Reasoning with Language Models for Classification of Logical
Fallacies [3.511369967593153]
We propose a Case-Based Reasoning method that classifies new cases of logical fallacy.
Our experiments indicate that Case-Based Reasoning improves the accuracy and generalizability of language models.
arXiv Detail & Related papers (2023-01-27T17:49:16Z) - MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure [129.8481568648651]
We propose a benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios.
Based on the multi-hop chain of reasoning, the explanation form includes three main components.
We evaluate the current best models' performance on this new explanation form.
arXiv Detail & Related papers (2022-10-22T16:01:13Z) - Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions.
We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.