Correlation or Causation: Analyzing the Causal Structures of LLM and LRM Reasoning Process
- URL: http://arxiv.org/abs/2509.17380v1
- Date: Mon, 22 Sep 2025 06:44:44 GMT
- Title: Correlation or Causation: Analyzing the Causal Structures of LLM and LRM Reasoning Process
- Authors: Zhizhang FU, Guangsheng Bao, Hongbo Zhang, Chenkai Hu, Yue Zhang,
- Abstract summary: This study contributes to the understanding of causality in reasoning models and provides insights for designing future AI systems with stronger causal foundations.<n>Our findings reveal that RLVR-trained LRMs exhibit enhanced causal reasoning capabilities, aligning more closely with ideal causal structures.
- Score: 15.790468401903874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLMs suffer from critical reasoning issues such as unfaithfulness, bias, and inconsistency, since they lack robust causal underpinnings and may rely on superficial correlations rather than genuine understanding. Successive LRMs have emerged as a promising alternative, leveraging advanced training techniques such as reinforcement learning (RL) and distillation to improve task accuracy. However, the impact of these training methods on causality remains largely unexplored. In this study, we conduct a systematic causal analysis on LLMs and LRMs, examining structural causal models (SCMs) of four key variables: problem instruction (Z), thinking process (T), reasoning steps (X), and answer (Y). Our findings reveal that RLVR-trained LRMs exhibit enhanced causal reasoning capabilities, aligning more closely with ideal causal structures, while LLMs and distilled LRMs fail to address causality-related deficiencies. Our further investigation indicates that RLVR reduces spurious correlations and strengthens genuine causal patterns, thereby mitigating unfaithfulness and bias. In addition, our inspection on the dynamics of the RLVR training process observes a high correlation between reduced spurious features and improved causal structures, where the causal relationships consistently improve in the training process. This study contributes to the understanding of causality in reasoning models, highlights the critical role of RLVR in enhancing causal reasoning, and provides insights for designing future AI systems with stronger causal foundations. We release our code and data at https://github.com/Harryking1999/CoT_Causal_Analysis.
Related papers
- CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching [50.65932158912512]
We propose a new causal reasoning benchmark, CausalFlip, to encourage the development of new large language models.<n>CaulFlip consists of causal judgment questions built over event triples that could form different confounder, chain, and collider relations.<n>We evaluate LLMs under multiple training paradigms, including answer-only training, explicit Chain-of-Thought supervision, and a proposed internalized causal reasoning approach.
arXiv Detail & Related papers (2026-02-23T18:06:15Z) - Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought? [79.86483056611105]
Reasoning LLMs generate step-by-step chains of thought before giving an answer.<n>How robust are these reasoning traces to disruptions that occur within them?<n>We introduce a controlled evaluation framework that perturbs a model's own CoT at fixed timesteps.
arXiv Detail & Related papers (2026-02-07T10:02:58Z) - How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z) - Unifying Causal Reinforcement Learning: Survey, Taxonomy, Algorithms and Applications [35.74838344207327]
Causal reinforcement learning (CRL) offers promising solutions to challenges by explicitly modeling cause-and-effect relationships.<n>We categorize existing approaches into causal representation learning, counterfactual policy optimization, offline causal RL, causal transfer learning, and causal explainability.<n>We provide future research directions, underscoring the potential of CRL for developing robust, generalizable, and interpretable artificial intelligence systems.
arXiv Detail & Related papers (2025-12-19T23:37:22Z) - RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization [86.30192066451256]
We propose RL-PLUS, a novel hybrid-policy optimization approach for Large Language Models (LLMs)<n> RL-PLUS synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models.<n>We provide both theoretical analysis and extensive experiments to demonstrate the superiority and generalizability of our approach.
arXiv Detail & Related papers (2025-07-31T23:55:29Z) - Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies [50.30741668990102]
We take a causal perspective on explaining the behavior of reinforcement learning policies.<n>We learn a simplified high-level causal model that explains these relationships.<n>We prove that for a class of nonlinear causal models, there exists a unique solution.
arXiv Detail & Related papers (2025-07-20T10:25:24Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
We introduce a fine-grained analytic framework to dissect the impact ofReinforcement learning on reasoning.<n>Our framework specifically investigates key elements that have been hypothesized to benefit from RL training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Concise Reasoning via Reinforcement Learning [13.657506042120167]
We revisit the core principles of reinforcement learning (RL)<n>We uncover a natural correlation between conciseness and accuracy that has been largely overlooked.<n>We show that introducing a secondary phase of RL training, using a very small set of problems, can significantly reduce chains of thought.
arXiv Detail & Related papers (2025-04-07T15:35:54Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - On the Reliability of Large Language Models for Causal Discovery [55.94868919310357]
This study investigates the efficacy of Large Language Models (LLMs) in causal discovery.<n>We use newly available open-source LLMs, OLMo and BLOOM, which provide access to their pre-training corpora.<n>We examine: (i) the impact of memorization for accurate causal relation prediction, (ii) the influence of incorrect causal relations in pre-training data, and (iii) the contextual nuances that influence LLMs' understanding of causal relations.
arXiv Detail & Related papers (2024-07-29T01:45:05Z) - How Likely Do LLMs with CoT Mimic Human Reasoning? [31.86489714330338]
Chain-of-thought emerges as a promising technique for eliciting reasoning capabilities from Large Language Models (LLMs)<n>We use causal analysis to understand the relationships between the problem instruction, reasoning, and the answer in LLMs.
arXiv Detail & Related papers (2024-02-25T10:13:04Z) - Seeing is not Believing: Robust Reinforcement Learning against Spurious
Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders.
A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one.
Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z) - Integrating Large Language Model for Improved Causal Discovery [25.50313039584238]
Large Language Models (LLM) have been used for causal analysis across various domain-specific scenarios.<n>We propose an error-tolerant LLM-driven causal discovery framework.
arXiv Detail & Related papers (2023-06-29T12:48:00Z) - Causal Inference Q-Network: Toward Resilient Reinforcement Learning [57.96312207429202]
We consider a resilient DRL framework with observational interferences.
Under this framework, we propose a causal inference based DRL algorithm called causal inference Q-network (CIQ)
Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.
arXiv Detail & Related papers (2021-02-18T23:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.