Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning
Abstract Overview
This paper studies how predictive entropy changes during chain-of-thought generation and argues that reasoning typically follows a two-phase pattern: a high-entropy uncertainty region followed by a sharp transition to a low-entropy confidence region. The authors report that answers become more accurate and stable after entering the confidence region, while generation often continues with redundant tokens beyond the point where the correct answer has effectively been reached. They formulate confidence-region detection as an online sequential change-point detection problem and implement it with the training-free CUSUM algorithm. The resulting framework is then used for both early exit and test-time scaling across several open-source reasoning models and benchmarks.
Novelty
The distinctive contribution is to analyze chain-of-thought reasoning through entropy dynamics over the full trajectory rather than through local stepwise heuristics, and to identify an abrupt uncertainty-to-confidence transition as a recurring structure. The paper also appears to be the first to operationalize this transition with classical change-point detection, specifically CUSUM, for real-time control of LLM reasoning.
Results
Empirically, the method yields a stronger early-exit efficiency-accuracy trade-off than the compared baselines. The paper reports an average accuracy of 63.06% with an 11.1% token reduction, outperforming DEER and Dynasor by 3.28 and 4.36 percentage points in accuracy respectively, and shows that CUSUM-weighted voting consistently surpasses self-consistency in test-time scaling, with larger gains as more trajectories are sampled.
Key Points
- Predictive entropy across chain-of-thought trajectories exhibits a consistent two-region structure with an abrupt transition from exploration to convergence.
- The confidence region is characterized by both higher answer reliability and substantial redundancy, motivating early termination and trajectory reweighting.
- A training-free CUSUM detector enables online identification of this transition and improves both early exit and test-time scaling relative to the reported baselines.