ARM: Adaptive Reasoning Model
- URL: http://arxiv.org/abs/2505.20258v1
- Date: Mon, 26 May 2025 17:38:50 GMT
- Title: ARM: Adaptive Reasoning Model
- Authors: Siye Wu, Jian Xie, Yikai Zhang, Aili Chen, Kai Zhang, Yu Su, Yanghua Xiao,
- Abstract summary: We propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate formats based on the task at hand.<n>Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of 30%, and up to 70%, while maintaining performance comparable to the model that relies solely on Long CoT.
- Score: 36.53965139929349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty. This often leads to the "overthinking" problem -- excessive and unnecessary reasoning -- which, although potentially mitigated by human intervention to control the token budget, still fundamentally contradicts the goal of achieving fully autonomous AI. In this work, we propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate reasoning formats based on the task at hand. These formats include three efficient ones -- Direct Answer, Short CoT, and Code -- as well as a more elaborate format, Long CoT. To train ARM, we introduce Ada-GRPO, an adaptation of Group Relative Policy Optimization (GRPO), which addresses the format collapse issue in traditional GRPO. Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of 30%, and up to 70%, while maintaining performance comparable to the model that relies solely on Long CoT. Furthermore, not only does it improve inference efficiency through reduced token generation, but it also brings a 2x speedup in training. In addition to the default Adaptive Mode, ARM supports two additional reasoning modes: 1) Instruction-Guided Mode, which allows users to explicitly specify the reasoning format via special tokens -- ideal when the appropriate format is known for a batch of tasks. 2) Consensus-Guided Mode, which aggregates the outputs of the three efficient formats and resorts to Long CoT in case of disagreement, prioritizing performance with higher token usage.
Related papers
- DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models [36.962276192354174]
textbfDART adjusts thinking length according to problem difficulty.<n>textbfTruncation framework learns when to stop thinking''
arXiv Detail & Related papers (2025-11-03T02:41:20Z) - DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains [56.708381920156256]
Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like overthinking'' simple problems and underthinking'' complex ones.<n>This paper introduces textbfDeepCompress, a novel framework that simultaneously enhances both the accuracy and efficiency of LRMs.
arXiv Detail & Related papers (2025-10-31T12:13:11Z) - e1: Learning Adaptive Control of Reasoning Effort [88.51897900019485]
Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning.<n>Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost.<n>We propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens.
arXiv Detail & Related papers (2025-10-30T23:12:21Z) - ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code [43.934586432132456]
Large Reasoning Models (LRMs) often suffer from the over-thinking'' problem, generating unnecessarily long reasoning on simple tasks.<n>We present ARM2, a unified model that adaptively balances reasoning performance and efficiency across multiple formats.
arXiv Detail & Related papers (2025-10-09T12:49:34Z) - HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs [54.16300997612526]
Large Language Models (LLMs) increasingly rely on Chain-of-Thought (CoT) reasoning to improve accuracy on complex tasks.<n>This paper introduces the Hybrid Policy Optimization (i.e., HiPO), a framework for adaptive reasoning control.<n> Experiments across mathematics and coding benchmarks demonstrate that HiPO can substantially reduce token length while maintaining or improving accuracy.
arXiv Detail & Related papers (2025-09-28T16:46:12Z) - Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals [8.264189366042675]
We propose Adaptive Overclocking, a method that makes the hyper parameter $alpha$ dynamic and context-aware.<n>Our method adjusts reasoning speed in real time through two complementary signals.<n> Experiments on GSM8K, MATH, and SVAMP show that HAC achieves superior accuracy-latency trade-offs.
arXiv Detail & Related papers (2025-09-21T09:40:27Z) - Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model [7.8354921036790275]
Large Reasoning Models (LRMs) excel at solving complex problems but face an overthinking dilemma.<n>When handling simple tasks, they often produce verbose responses overloaded with thinking tokens.<n>These tokens trigger unnecessary high-level reasoning behaviors like reflection and backtracking, reducing efficiency.
arXiv Detail & Related papers (2025-06-30T13:30:33Z) - ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z) - CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling.<n>However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency.<n>We identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps.
arXiv Detail & Related papers (2025-05-28T06:24:45Z) - Adaptive Deep Reasoning: Triggering Deep Thinking When Needed [28.575411507835973]
Large language models (LLMs) have shown impressive capabilities in handling complex tasks through long-chain reasoning.<n>We propose a novel approach that autonomously switches between short and long-chain reasoning chains based on problem complexity.<n>This advancement enhances the practicality of reasoning in large language models for real-world applications.
arXiv Detail & Related papers (2025-05-26T15:08:51Z) - Thinkless: LLM Learns When to Think [57.857534644932194]
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.<n>We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning.<n>On several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%.
arXiv Detail & Related papers (2025-05-19T17:24:16Z) - Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z) - Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z) - Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization [86.56120216550232]
We propose a novel two-stage framework for adaptive and efficient reasoning.<n>First, we construct a hybrid reasoning model by merging long and short CoT models.<n>Second, we apply bi-level preference training to guide the model to select suitable reasoning styles.
arXiv Detail & Related papers (2025-04-30T14:01:45Z) - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [31.189242663680695]
This paper introduces Difficulty-Adaptive Slow-Thinking (DAST), a novel framework that enables models to autonomously adjust the length of Chain-of-Thought(CoT) based on problem difficulty.<n>Experiments on diverse datasets and model scales demonstrate that DAST effectively mitigates overthinking while preserving reasoning accuracy on complex problems.
arXiv Detail & Related papers (2025-03-06T14:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.