ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code
- URL: http://arxiv.org/abs/2510.08163v3
- Date: Tue, 14 Oct 2025 18:35:24 GMT
- Title: ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code
- Authors: Jian Xie, Zhendong Chu, Aoxiao Zhong, Kai Zhang, Mingzhe Han, Xing Fan, Jialie Shen, Qingsong Wen,
- Abstract summary: Large Reasoning Models (LRMs) often suffer from the over-thinking'' problem, generating unnecessarily long reasoning on simple tasks.<n>We present ARM2, a unified model that adaptively balances reasoning performance and efficiency across multiple formats.
- Score: 43.934586432132456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Reasoning Models (LRMs) often suffer from the ``over-thinking'' problem, generating unnecessarily long reasoning on simple tasks. Some strategies have been proposed to mitigate this issue, such as length penalties or routing mechanisms, but they are typically heuristic and task-specific, lacking a general framework for adaptive reasoning. In this paper, we present ARM2, a unified model that adaptively balances reasoning performance and efficiency across multiple formats through a reinforcement learning framework augmented with length-aware optimization. Beyond conventional natural language inference, ARM2 integrates vision understanding, extending its applicability to multimodal. Moreover, ARM2 integrates executable code into reasoning, enabling substantial reductions in token cost while preserving task performance compared to long CoT. Experiments demonstrate that ARM2 achieves performance on par with traditional reasoning models trained with GRPO, while reducing token usage by over 70% on average. We further conduct extensive analyses to validate the effectiveness of ARM2 and the soundness of its design.
Related papers
- Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning [13.096138112729358]
Large Reasoning Models (LRMs) achieve explicit chain-of-thought expansion by imitating deep thinking behaviors of humans.<n>However, the deep-thinking mode often leads to unnecessarily lengthy reasoning and resource inefficiency when handling simple tasks.<n>This paper proposes Difficulty-aware Policy Optimization (DiPO), a reinforcement learning-based LRM training framework.
arXiv Detail & Related papers (2026-01-29T08:56:45Z) - Structured Reasoning for Large Language Models [59.215789462977206]
We propose Structured Reasoning (SCR), a framework that decouples reasoning trajectories into explicit, evaluable, and trainable components.<n>SCR substantially improves reasoning efficiency and self-verification.<n>Compared with existing reasoning paradigms, it reduces output token length by up to 50%.
arXiv Detail & Related papers (2026-01-12T04:04:01Z) - BARD: budget-aware reasoning distillation [25.725960386304646]
Long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models.<n>We propose bftextBudget-Aware Reasoning Distillation (BARD), a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length.
arXiv Detail & Related papers (2025-11-03T11:30:18Z) - Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging [34.0419616643477]
Tiny-R1V is a novel lightweight 3B model that achieves faster inference and higher accuracy via a two-stage optimization.<n>In the first stage, Tiny-R1V introduces Length-Informed Relative Policy Optimization (LIPO), a novel reinforcement learning method.<n>In the second stage, we propose Adaptive Model Merging (AMM), a training-free model merging method.
arXiv Detail & Related papers (2025-10-10T04:14:57Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - ARM: Adaptive Reasoning Model [36.53965139929349]
We propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate formats based on the task at hand.<n>Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of 30%, and up to 70%, while maintaining performance comparable to the model that relies solely on Long CoT.
arXiv Detail & Related papers (2025-05-26T17:38:50Z) - Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models [50.4652276723694]
Think-RM generates flexible, self-guided reasoning traces that support advanced capabilities.<n>Think-RM achieves state-of-the-art results on RM-Bench, outperforming both BT RM and vertically scaled GenRM by 8%.
arXiv Detail & Related papers (2025-05-22T05:56:11Z) - When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning [20.233873556056487]
Large reasoning models (LRMs) achieve remarkable performance via long reasoning chains, but often incur excessive computational overhead due to redundant reasoning.<n>We propose Adaptive Self-Recovery Reasoning (ASRR), a framework that suppresses unnecessary reasoning and enables implicit recovery.<n>Our results highlight the potential of ASRR for enabling efficient, adaptive, and safer reasoning in LRMs.
arXiv Detail & Related papers (2025-05-21T11:41:39Z) - Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z) - Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z) - Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models [21.88436406884943]
We introduce EED, which diversifies the compression of a single base model based on different pruning importance scores and enhances ensemble diversity to achieve high adversarial robustness and resource efficiency.<n>EED demonstrated state-of-the-art performance compared to existing adversarial pruning techniques, along with an inference speed improvement of up to 1.86 times.
arXiv Detail & Related papers (2025-04-07T05:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.