Related papers: Entropy-Guided Reasoning Compression

Entropy-Guided Reasoning Compression

URL: http://arxiv.org/abs/2511.14258v2
Date: Mon, 24 Nov 2025 10:36:50 GMT
Title: Entropy-Guided Reasoning Compression
Authors: Hourun Zhu, Yang Gao, Wenlong Fei, Jiawei Li, Huashan Sun,
Abstract summary: We develop an entropy-guided training framework for large reasoning models.<n>As entropy descends, the model is guided toward efficient reasoning by encouraging concise thought steps.<n>Our method compresses reasoning length to 20% of the original while maintaining or even surpassing baseline accuracy.
Score: 11.181525993239115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models have demonstrated remarkable performance on complex reasoning tasks, yet the excessive length of their chain-of-thought outputs remains a major practical bottleneck due to high computation cost and poor deployability. Existing compression methods have achieved partial success but overlook a crucial phenomenon in the training process -- the entropy conflict. During compression training, entropy decreases, leading to shorter reasoning but limited exploration, while accuracy-oriented objectives increase entropy, lengthening reasoning chains. This can cause the model to get stuck in a local dilemma. Our analysis further reveals the origin of the entropy conflict: many high-entropy tokens are logical connectors that receive larger gradients and are encouraged under the performance objective, while the compression objective simultaneously penalizes these potentially redundant connectors. This opposing pressure creates a direct source of entropy conflict. To address these issues, we adopt an entropy-guided training framework. As entropy descends, the model is guided toward efficient reasoning by encouraging concise thought steps; as entropy rises, exploration is reinforced under the compact reasoning mode to improve robustness. Experiments on six mathematical benchmarks show that our method compresses reasoning length to 20% of the original while maintaining or even surpassing baseline accuracy. Code and models will be released publicly.

Related papers

Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning [39.72119774004103]
Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks.<n>The verbose nature of explicit reasoning steps incurs prohibitive inference latency and computational costs, limiting real-world deployment.<n>We propose Compress responses for Easy questions and Explore Hard ones (CEEH), a difficulty-aware approach to RL-based efficient reasoning.
arXiv Detail & Related papers (2026-02-26T05:47:30Z)
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought [49.203970812338916]
Explicit reasoning chains introduce substantial computational redundancy.<n>Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space.<n>We propose Rendered CoT-Guided variational Latent Reasoning (ReGuLaR)
arXiv Detail & Related papers (2026-01-30T17:08:06Z)
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference [68.05879215304641]
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear.<n>Our work aims to improve their efficiency, enabling them to reach high performance without overthinking.<n>We introduce textbfDiffAdapt, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and reasoning trace entropy.
arXiv Detail & Related papers (2025-10-22T15:16:06Z)
PEAR: Phase Entropy Aware Reward for Efficient Reasoning [23.381346604897246]
This paper introduces Phase Entropy Aware Reward (PEAR), a reward mechanism that incorporates phase-dependent entropy into the reward design.<n>Experiments across four benchmarks demonstrate that PEAR consistently reduces response length while sustaining competitive accuracy across model scales.
arXiv Detail & Related papers (2025-10-09T10:04:31Z)
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation [82.62935304152239]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning.<n>They often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems.<n>We introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process.
arXiv Detail & Related papers (2025-10-02T17:36:50Z)
Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction [3.9481110638616617]
We measure the model's uncertainty on the answer span Y at each reasoning step using conditional entropy.<n>We also corroborate that incorrect reasoning paths tend to be longer than correct ones, suggesting that longer reasoning does not necessarily yield better outcomes.
arXiv Detail & Related papers (2025-08-28T03:43:38Z)
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models [99.98293908799731]
This paper aims to overcome a major obstacle in scaling RL for reasoning with LLMs, namely the collapse of policy entropy.<n>In practice, we establish a transformation equation R=-a*eH+b between entropy H and downstream performance R.<n>We propose two simple yet effective techniques, namely Clip-Cov and KL-Cov, which clip and apply KL penalty to tokens with high covariances respectively.
arXiv Detail & Related papers (2025-05-28T17:38:45Z)
Adaptive Deep Reasoning: Triggering Deep Thinking When Needed [28.575411507835973]
Large language models (LLMs) have shown impressive capabilities in handling complex tasks through long-chain reasoning.<n>We propose a novel approach that autonomously switches between short and long-chain reasoning chains based on problem complexity.<n>This advancement enhances the practicality of reasoning in large language models for real-world applications.
arXiv Detail & Related papers (2025-05-26T15:08:51Z)
Entropy-Based Block Pruning for Efficient Large Language Models [81.18339597023187]
We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance.<n> Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
arXiv Detail & Related papers (2025-04-04T03:42:34Z)
Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.