DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
- URL: http://arxiv.org/abs/2510.19669v2
- Date: Thu, 30 Oct 2025 19:39:41 GMT
- Title: DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
- Authors: Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi,
- Abstract summary: Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear.<n>Our work aims to improve their efficiency, enabling them to reach high performance without overthinking.<n>We introduce textbfDiffAdapt, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and reasoning trace entropy.
- Score: 68.05879215304641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthinking. First, we analyze the entropy of token probabilities in reasoning traces. Across three models, we observe a consistent U-shaped entropy pattern: high entropy on easy problems despite high accuracy, low entropy on problems with medium difficulty, and high entropy on hard problems reflecting uncertainty. Specifically, we notice 22--25\% entropy reduction from easy to medium difficulty regions, suggesting an {overthinking} phenomenon on easy instances. Building on these insights, we introduce \textbf{DiffAdapt}, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and reasoning trace entropy. Each inference strategy consists of a fixed prompt, temperature and maximum token length. In contrast to existing efficiency optimization methods, our approach does not fine-tune base LLM but a small probe that classifies LLM's final hidden state, allowing inexpensive adaptation. We comprehensively evaluate our method on five models and eight benchmarks. Our method achieves comparable or improved accuracy while reducing token usage by up to 22.4\%, establishing a practical path toward compute-efficient reasoning.
Related papers
- Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning [39.72119774004103]
Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks.<n>The verbose nature of explicit reasoning steps incurs prohibitive inference latency and computational costs, limiting real-world deployment.<n>We propose Compress responses for Easy questions and Explore Hard ones (CEEH), a difficulty-aware approach to RL-based efficient reasoning.
arXiv Detail & Related papers (2026-02-26T05:47:30Z) - Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning [30.889495810312624]
We propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning.<n>By jointly optimizing data organization and algorithmic design, our method effectively mitigates entropy collapse and enhances reasoning.
arXiv Detail & Related papers (2025-12-04T01:09:17Z) - ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping [54.37497695483689]
We propose ARES, a unified framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty.<n>Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens can reliably capture reasoning-critical moments.<n>In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness.<n>In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers
arXiv Detail & Related papers (2025-10-09T17:03:28Z) - Probing the Difficulty Perception Mechanism of Large Language Models [31.945071671041465]
We investigate whether large language models implicitly encode problem difficulty in their internal representations.<n>We locate the specific attention heads of the final Transformer layer.<n>Experiments provide practical support for using LLMs as automatic difficulty annotators.
arXiv Detail & Related papers (2025-10-07T14:24:32Z) - Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation [82.62935304152239]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning.<n>They often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems.<n>We introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process.
arXiv Detail & Related papers (2025-10-02T17:36:50Z) - Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z) - Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation [82.2288581878096]
We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity.<n>We show that models can be endowed with such dynamic inference pathways without any architectural modifications.
arXiv Detail & Related papers (2025-09-05T16:40:13Z) - The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models [54.88805865447848]
We show that instruct models achieve higher efficiency overall, and problem difficulty affects efficiency.<n>We propose COTHINK, a simple two-stage pipeline: an instruct model drafts a brief outline, and a thinking model expands it.<n>On GSM8K, MATH500, and AIME24, COTHINK cuts token usage by 21.1% while keeping accuracy on four thinking models, and remains competitive with strong efficiency baselines.
arXiv Detail & Related papers (2025-05-28T06:24:45Z) - Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? [59.418994222096885]
We conduct a detailed analysis of model performance on the AIME24 dataset.<n>We categorize questions into four tiers (Easy, Medium, Hard, and Extremely Hard)<n>We find that progression from Easy to Medium tier requires adopting an R1 reasoning style with minimal SFT-1K instances.<n>Exh-level questions present a fundamentally different challenge; they require unconventional problem-solving skills.
arXiv Detail & Related papers (2025-04-16T03:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.