FuguReport

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

Authors Ting Xu, Xu He, Yupu Lu, Jiankai Sun, Dong Li, Wai Lam, Jianye Hao
Affiliations Huawei / The Chinese University of Hong Kong / The University of Hong Kong
Categories Evaluation / Model Behavior Analysis / Entropy characteristics in reasoning, Method / Inference Strategy / Efficient and reliable reasoning strategies, Task / Reasoning / Chain-of-thought output generation
License CC BY 4.0

Abstract Overview

This paper studies how predictive entropy changes during chain-of-thought generation and argues that reasoning typically follows a two-phase pattern: a high-entropy uncertainty region followed by a sharp transition to a low-entropy confidence region. The authors report that answers become more accurate and stable after entering the confidence region, while generation often continues with redundant tokens beyond the point where the correct answer has effectively been reached. They formulate confidence-region detection as an online sequential change-point detection problem and implement it with the training-free CUSUM algorithm. The resulting framework is then used for both early exit and test-time scaling across several open-source reasoning models and benchmarks.

Novelty

The distinctive contribution is to analyze chain-of-thought reasoning through entropy dynamics over the full trajectory rather than through local stepwise heuristics, and to identify an abrupt uncertainty-to-confidence transition as a recurring structure. The paper also appears to be the first to operationalize this transition with classical change-point detection, specifically CUSUM, for real-time control of LLM reasoning.

Results

Empirically, the method yields a stronger early-exit efficiency-accuracy trade-off than the compared baselines. The paper reports an average accuracy of 63.06% with an 11.1% token reduction, outperforming DEER and Dynasor by 3.28 and 4.36 percentage points in accuracy respectively, and shows that CUSUM-weighted voting consistently surpasses self-consistency in test-time scaling, with larger gains as more trajectories are sampled.

Key Points

  1. Predictive entropy across chain-of-thought trajectories exhibits a consistent two-region structure with an abrupt transition from exploration to convergence.
  2. The confidence region is characterized by both higher answer reliability and substantial redundancy, motivating early termination and trajectory reweighting.
  3. A training-free CUSUM detector enables online identification of this transition and improves both early exit and test-time scaling relative to the reported baselines.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.