Related papers: Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning

Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning

URL: http://arxiv.org/abs/2510.10207v2
Date: Tue, 14 Oct 2025 03:51:57 GMT
Title: Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning
Authors: Yujian Zhang, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, Xing Sun,
Abstract summary: We propose Adaptive Dual Reasoner, which supports two reasoning modes: fast thinking and slow thinking.<n> ADR alternates between these modes based on the contextual complexity during reasoning.<n>It achieves an effective balance between reasoning performance and efficiency among state-of-the-art approaches.
Score: 24.84164221980507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Long Reasoning Models (LRMs) have achieved superior performance on various reasoning scenarios, they often suffer from increased computational costs and inference latency caused by overthinking. To address these limitations, we propose Adaptive Dual Reasoner, which supports two reasoning modes: fast thinking and slow thinking. ADR dynamically alternates between these modes based on the contextual complexity during reasoning. ADR is trained in two stages: (1) A cold-start stage using supervised fine-tuning (SFT) to equip the model with the ability to integrate both fast and slow reasoning modes, in which we construct a hybrid reasoning dataset through a dedicated pipeline to provide large-scale supervision. (2) A reinforcement learning stage for optimizing reasoning effort, where we introduce Entropy-guided Hybrid Policy Optimization EHPO, an RL training framework employing an entropy-guided dynamic rollout strategy for branching at high-entropy units and a difficulty-aware penalty to balance fast and slow reasoning. Across challenging mathematical reasoning benchmarks, ADR achieves an effective balance between reasoning performance and efficiency among state-of-the-art approaches. Specifically, ADR yields a performance gain of up to 6.1%, while reducing the reasoning output length by 49.5% to 59.3%.

Related papers

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation [14.501114943020589]
Large reasoning models (LRMs) achieve strong performance through extended reasoning traces.<n>LRMs often exhibit overthinking behavior for low-complexity queries.<n>We propose a two-stage framework for stable adaptive thinking in LRMs.
arXiv Detail & Related papers (2026-02-26T02:49:36Z)
BARD: budget-aware reasoning distillation [25.725960386304646]
Long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models.<n>We propose bftextBudget-Aware Reasoning Distillation (BARD), a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length.
arXiv Detail & Related papers (2025-11-03T11:30:18Z)
DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models [36.962276192354174]
textbfDART adjusts thinking length according to problem difficulty.<n>textbfTruncation framework learns when to stop thinking''
arXiv Detail & Related papers (2025-11-03T02:41:20Z)
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping [54.37497695483689]
We propose ARES, a unified framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty.<n>Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens can reliably capture reasoning-critical moments.<n>In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness.<n>In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers
arXiv Detail & Related papers (2025-10-09T17:03:28Z)
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z)
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens [51.90059610606049]
This paper revisits the efficiency of such reasoning processes through an information-theoretic lens.<n>We propose two metrics, InfoBias and InfoGain, to quantify divergence from ideal reasoning paths and stepwise information contribution.<n>Motivated by these findings, we introduce an entropy-based Adaptive Think strategy that dynamically halts reasoning once confidence is sufficiently high.
arXiv Detail & Related papers (2025-05-23T13:38:56Z)
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [75.04643265875072]
Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking.<n>Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization.<n>ACPO enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch.
arXiv Detail & Related papers (2025-05-22T07:15:08Z)
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping [23.626013831589212]
Large Reasoning Models (LRMs) have shown remarkable capabilities in solving complex problems through reinforcement learning (RL)<n>We present a unified framework that formulates various efficient reasoning methods through the lens of length-based reward shaping.<n>Experiments on DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-32B show that our approach significantly enhances both reasoning performance and response length efficiency.
arXiv Detail & Related papers (2025-05-21T15:03:26Z)
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization [86.56120216550232]
We propose a novel two-stage framework for adaptive and efficient reasoning.<n>First, we construct a hybrid reasoning model by merging long and short CoT models.<n>Second, we apply bi-level preference training to guide the model to select suitable reasoning styles.
arXiv Detail & Related papers (2025-04-30T14:01:45Z)
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning [1.0416697066889342]
We propose a simple yet effective reinforcement learning method that enables reasoning models to learn their own optimal CoT lengths without manual supervision.<n>ShorterBetter achieves 50%-80% reduction in output lengths in both in-domain and out-of-domain reasoning tasks.<n>Our reasoning trace analysis shows that ShorterBetter refines the structure of the reasoning traces by reducing unnecessary repetition, excessive self-verification, and over-exploration of alternatives.
arXiv Detail & Related papers (2025-04-30T07:04:19Z)
Dynamic Parallel Tree Search for Efficient LLM Reasoning [102.16694475391665]
Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree.<n>We propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference.<n> Experiments on Qwen-2.5 and Llama-3 with Math500 and GSM8K datasets show that DPTS significantly improves efficiency by 2-4x on average.
arXiv Detail & Related papers (2025-02-22T14:13:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.