Related papers: Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

URL: http://arxiv.org/abs/2601.03154v1
Date: Tue, 06 Jan 2026 16:26:40 GMT
Title: Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective
Authors: Beiduo Chen, Tiancheng Hu, Caiqi Zhang, Robert Litschko, Anna Korhonen, Barbara Plank,
Abstract summary: We show that long Chain-of-Thought (CoT) serves as a decisive decision-maker for the top option but fails to function as a granular distribution calibrator for ambiguous tasks.<n>We observe a distinct "decoupled mechanism": while CoT improves distributional alignment, final accuracy is dictated by CoT content.
Score: 60.45433515408158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning-tuned LLMs utilizing long Chain-of-Thought (CoT) excel at single-answer tasks, yet their ability to model Human Label Variation--which requires capturing probabilistic ambiguity rather than resolving it--remains underexplored. We investigate this through systematic disentanglement experiments on distribution-based tasks, employing Cross-CoT experiments to isolate the effect of reasoning text from intrinsic model priors. We observe a distinct "decoupled mechanism": while CoT improves distributional alignment, final accuracy is dictated by CoT content (99% variance contribution), whereas distributional ranking is governed by model priors (over 80%). Step-wise analysis further shows that while CoT's influence on accuracy grows monotonically during the reasoning process, distributional structure is largely determined by LLM's intrinsic priors. These findings suggest that long CoT serves as a decisive LLM decision-maker for the top option but fails to function as a granular distribution calibrator for ambiguous tasks.

Related papers

Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference [5.373905622325275]
We develop a divergence-minimization (DM) framework for robust and efficient inference in latent-mixture models.<n>By optimizing a residual-adjusted divergence, the DM approach recovers EM as a special case and yields robust alternatives.
arXiv Detail & Related papers (2025-11-22T08:25:29Z)
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z)
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning [51.56484100374058]
We introduce a principled risk decomposition that separates the total ICL risk into two components: Bayes Gap and Posterior Variance.<n>For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts.<n>The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty.
arXiv Detail & Related papers (2025-10-13T03:42:31Z)
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models [9.282278040339138]
Chain-of-Thought (CoT) technique has proven effective in improving the performance of large language models (LLMs) on complex reasoning tasks.<n>We make a preliminary observation that the monotonicity of token probability distributions may be correlated with the gains achieved through CoT reasoning.<n>We propose two indicators based on the token probability distribution to assess CoT effectiveness across different tasks.
arXiv Detail & Related papers (2025-06-06T11:53:27Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems. We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective [6.845698872290768]
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. We attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts.
arXiv Detail & Related papers (2023-12-21T23:20:47Z)
Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [104.60508550106618]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)<n>We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.<n>We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z)
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty [58.144520501201995]
Bi-Lipschitz regularization of neural network layers preserve relative distances between data instances in the feature spaces of each layer. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. We also propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution.
arXiv Detail & Related papers (2021-10-12T22:04:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.