Related papers: Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization

Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization

URL: http://arxiv.org/abs/2502.04667v2
Date: Mon, 05 May 2025 09:01:06 GMT
Title: Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization
Authors: Xinhao Yao, Ruifeng Ren, Yun Liao, Yong Liu,
Abstract summary: The integration of explicit Chain-of-Thought (CoT) reasoning into training large language models has advanced their reasoning capabilities, yet the mechanisms by which CoT enhances generalization remain poorly understood.<n>This work investigates (1) textithow CoT training reshapes internal model representations and (2) textitwhy it improves both in-distribution (ID) and out-of-distribution (OOD) reasoning generalization.
Score: 9.191236388401226
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of explicit Chain-of-Thought (CoT) reasoning into training large language models (LLMs) has advanced their reasoning capabilities, yet the mechanisms by which CoT enhances generalization remain poorly understood. This work investigates (1) \textit{how} CoT training reshapes internal model representations and (2) \textit{why} it improves both in-distribution (ID) and out-of-distribution (OOD) reasoning generalization. Through controlled experiments and theoretical analysis, we derive the following key insights. \textbf{1)} Structural Advantage: CoT training internalizes reasoning into a two-stage generalizing circuit, where the number of stages corresponds to the explicit reasoning steps during training. Notably, CoT-trained models resolve intermediate results at shallower layers compared to non-CoT counterparts, freeing up deeper layers to specialize in subsequent reasoning steps. \textbf{2)} Theoretical Analysis: the information-theoretic generalization bounds via distributional divergence can be decomposed into ID and OOD components. While ID error diminishes with sufficient training regardless of CoT, OOD error critically depends on CoT: Non-CoT training fails to generalize to OOD samples due to unseen reasoning patterns, whereas CoT training achieves near-perfect OOD generalization by mastering subtasks and reasoning compositions during training. The identified mechanisms explain our experimental results: CoT training accelerates convergence and enhances generalization from ID to both ID and OOD scenarios while maintaining robust performance even with tolerable noise. These findings are further validated on complex real-world datasets. This paper offers valuable insights for designing CoT strategies to enhance LLM reasoning robustness.

Related papers

CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z)
CC-LEARN: Cohort-based Consistency Learning [5.7716971260066]
Large language models struggle with consistent, robust reasoning.<n>We introduce cohort-based Consistency Learning (CC-Learn)<n>Experiments show that CC-Learn boosts both accuracy and reasoning stability over pretrained and SFT baselines.
arXiv Detail & Related papers (2025-06-18T17:41:28Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
Chain-of-Thought Prompting for Out-of-Distribution Samples: A Latent-Variable Study [5.236910203359897]
Chain-of-Thought (CoT) prompting has emerged as a powerful technique to improve in-context learning in large language models. We extend a latent-variable framework for CoT prompting and study its behavior on two prototypical out-of-distribution (OOD) scenarios. Our experiments demonstrate that CoT inference generalizes effectively to OOD samples whose latent variables closely resemble those seen during training, but its performance degrades as this similarity decreases.
arXiv Detail & Related papers (2025-04-17T14:59:29Z)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [39.613595533503144]
Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models. We show that CoT consistently underperforms direct answering across varying model scales and benchmark complexities. Our analysis uncovers a fundamental explicit-implicit duality driving CoT's performance in pattern-based ICL.
arXiv Detail & Related papers (2025-04-07T13:51:06Z)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding. It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions. Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT) Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization [35.16980045900664]
Generalization to novel compound tasks under distribution shift is important for deploying transformer-based language models (LMs) This work investigates Chain-of-Thought (CoT) reasoning as a means to enhance OOD generalization.
arXiv Detail & Related papers (2025-02-25T15:04:17Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [51.631483479081645]
Large Language Models (LLMs) employ Chain-of-Thought (CoT) reasoning to deconstruct complex problems.<n>This paper argues that longer CoTs are often presumed superior, arguing that longer is not always better.
arXiv Detail & Related papers (2025-02-11T05:28:59Z)
Demystifying Long Chain-of-Thought Reasoning in LLMs [46.352406501403465]
Long chains-of-thought (CoTs) enable strategies like backtracking and error correction.<n>Reinforcement learning (RL) has emerged as a crucial method for developing these capabilities.<n>We identify the key factors that enable models to generate long CoT trajectories.
arXiv Detail & Related papers (2025-02-05T17:13:32Z)
Rethinking Chain-of-Thought from the Perspective of Self-Training [10.722453877596998]
Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs.<n>We propose a novel CoT framework to improve reasoning performance.<n>Our framework integrates two key components: (i) a task-specific prompt module that optimize the initial reasoning process, and (ii) an adaptive reasoning module that dynamically refines the reasoning process.
arXiv Detail & Related papers (2024-12-14T13:12:50Z)
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner [2.779063752888881]
Self-taught reasoner (STaR) framework uses reinforcement learning to automatically generate reasoning steps. STaR and its variants have demonstrated empirical success, but a theoretical foundation explaining these improvements is lacking. This work provides a theoretical framework for understanding the effectiveness of reinforcement learning on CoT reasoning and STaR.
arXiv Detail & Related papers (2024-10-31T13:17:53Z)
Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z)
Sparse Mixture-of-Experts for Compositional Generalization: Empirical Evidence and Theoretical Foundations of Optimal Sparsity [89.81738321188391]
This study investigates the relationship between task complexity and optimal sparsity in SMoE models.<n>We show that the optimal sparsity lies between minimal activation (1-2 experts) and full activation, with the exact number scaling proportionally to task complexity.
arXiv Detail & Related papers (2024-10-17T18:40:48Z)
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps. We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z)
The mechanistic basis of data dependence and abrupt learning in an in-context classification task [0.3626013617212666]
We show that specific distributional properties inherent in language control the trade-off or simultaneous appearance of two forms of learning. In-context learning is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning. We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL.
arXiv Detail & Related papers (2023-12-03T20:53:41Z)
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs) We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.