REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
- URL: http://arxiv.org/abs/2509.22518v1
- Date: Fri, 26 Sep 2025 16:02:27 GMT
- Title: REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
- Authors: Bo Li, Guanzhi Deng, Ronghao Chen, Junrong Yue, Shuo Zhang, Qinghua Zhao, Linqi Song, Lijie Wen,
- Abstract summary: We define the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations.<n>We build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples.<n>Our experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations.
- Score: 29.40036398095681
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding how Large Language Models (LLMs) perform complex reasoning and their failure mechanisms is a challenge in interpretability research. To provide a measurable geometric analysis perspective, we define the concept of the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations. This structure can be conceptualized as the embodiment of the effective thinking paths that the model has learned to successfully solve a given task. Based on this concept, we build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples. Specifically, REMA first quantifies the geometric deviation of each erroneous representation by calculating its k-nearest neighbors distance to the approximated manifold formed by correct representations, thereby providing a unified failure signal. It then localizes the divergence points where these deviations first become significant by tracking this deviation metric across the model's layers and comparing it against a baseline of internal fluctuations from correct representations, thus identifying where the reasoning chain begins to go off-track. Our extensive experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations. The results also validate the effectiveness of the REMA framework in analyzing the origins of reasoning failures. This research connects abstract reasoning failures to measurable geometric deviations in representations, providing new avenues for in-depth understanding and diagnosis of the internal computational processes of black-box models.
Related papers
- How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning [17.18771466838129]
Solving complex geometric problems inherently requires interleaved reasoning.<n>We argue thatSupervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance.<n>We propose Faire, a reinforcement learning framework that enforces three casual constraints to move beyond superficial imitation.
arXiv Detail & Related papers (2026-03-01T12:18:12Z) - Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient [27.26870804635122]
Large language models (LLMs) demonstrate strong reasoning abilities in solving complex real-world problems.<n> internal mechanisms driving these complex reasoning behaviors remain opaque.<n>We propose Integrated Policy Gradient (IPG), a novel framework that attributes reasoning behaviors to model's inner components.
arXiv Detail & Related papers (2026-02-02T16:43:09Z) - Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models [72.4149653187766]
We propose a Reasoner-Verifier framework named Adrialversa Reasoning RAG (ARR)<n>The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage.<n> Experiments on multiple benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2026-01-08T06:57:03Z) - Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts [74.47786985522762]
We identify a critical failure mode termed textual inertia, where models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence.<n>We propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs.<n>Results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation.
arXiv Detail & Related papers (2026-01-07T16:39:34Z) - Schoenfeld's Anatomy of Mathematical Reasoning by Language Models [56.656180566692946]
We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
arXiv Detail & Related papers (2025-12-23T02:44:25Z) - Identifiable Multi-View Causal Discovery Without Non-Gaussianity [63.217175519436125]
We propose a novel approach to linear causal discovery in the framework of multi-view Structural Equation Models (SEM)<n>We prove the identifiability of all the parameters of the model without any further assumptions on the structure of the SEM other than it being acyclic.<n>The proposed methodology is validated through simulations and application on real data, where it enables the estimation of causal graphs between brain regions.
arXiv Detail & Related papers (2025-02-27T14:06:14Z) - Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning [9.795934690403374]
It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks.<n>We employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process.<n>We demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.
arXiv Detail & Related papers (2025-02-13T07:19:05Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Calibrating Reasoning in Language Models with Internal Consistency [18.24350001344488]
Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks.<n>LLMs often generate text with obvious mistakes and contradictions.<n>In this work, we investigate reasoning in LLMs through the lens of internal representations.
arXiv Detail & Related papers (2024-05-29T02:44:12Z) - Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism [68.05754701230039]
We construct a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models.<n>We propose a random matrix-based algorithm to enhance the model's reasoning ability.
arXiv Detail & Related papers (2024-05-24T07:41:26Z) - Learning a Structural Causal Model for Intuition Reasoning in
Conversation [20.243323155177766]
Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models.
We develop a conversation cognitive model ( CCM) that explains how each utterance receives and activates channels of information.
By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds.
arXiv Detail & Related papers (2023-05-28T13:54:09Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all.
We propose a new adversarial training method that mimics the disentangled structure of the causal model.
Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.