Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
- URL: http://arxiv.org/abs/2512.03783v2
- Date: Thu, 04 Dec 2025 13:48:37 GMT
- Title: Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
- Authors: Dongchao Yang, Songxiang Liu, Disong Wang, Yuanyuan Wang, Guanglu Wan, Helen Meng,
- Abstract summary: We propose a novel adaptive reasoning framework that dynamically adjusts the model's reasoning depth according to task difficulty.<n>Our framework comprises two stages: (1) an Adaptive Supervised Fine-Tuning stage, which endows the Omni model with fundamental reasoning capability using large-scale reasoning-augmented data, and (2) an Adaptive Reinforcement Learning stage, which optimize reasoning behaviors based on task complexity and reward feedback.
- Score: 57.96134674544638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Omni models have enabled unified multimodal perception and generation. However, most existing systems still exhibit rigid reasoning behaviors, either overthinking simple problems or failing to reason when necessary. To address this limitation, we propose Omni-AutoThink, a novel adaptive reasoning framework that dynamically adjusts the model's reasoning depth according to task difficulty. Our framework comprises two stages: (1) an Adaptive Supervised Fine-Tuning (Adaptive SFT) stage, which endows the Omni model with fundamental reasoning capability using large-scale reasoning-augmented data, and (2) an Adaptive Reinforcement Learning (Adaptive GRPO) stage, which optimizes reasoning behaviors based on task complexity and reward feedback. We further construct a comprehensive adaptive reasoning benchmark that spans text-only, text-audio, text-visual, and text-audio-visual modalities, providing both training and evaluation splits for multimodal reasoning assessment. Experimental results demonstrate that our proposed framework significantly improves adaptive reasoning performance compared to previous baselines. All benchmark data and code will be publicly released.
Related papers
- ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding [65.16833684071715]
Think Omni is a training-free and data-free framework that lifts textual reasoning to omni-modal scenarios.<n> Experiments on six multi-modal reasoning benchmarks demonstrate that Think Omni consistently delivers performance improvements.
arXiv Detail & Related papers (2026-02-26T18:10:41Z) - AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process [35.95284812390557]
We propose AdaptMMBench, a benchmark for adaptive multimodal reasoning across five domains: real-world, OCR, GUI, knowledge, and math.<n>Our evaluation reveals that while adaptive mode selection scales with model capacity, it notably decouples from final accuracy.
arXiv Detail & Related papers (2026-02-02T19:00:27Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning [46.89219923892907]
We propose Audio-Thinker, a reinforcement learning framework designed to enhance the reasoning capabilities of large audio language models (LALMs)<n>Our approach introduces an adaptive think accuracy reward, enabling the model to adjust its reasoning strategies based on task dynamically.<n> Experimental results demonstrate that our Audio-Thinker model outperforms existing reasoning-oriented LALMs across various benchmark tasks.
arXiv Detail & Related papers (2025-08-11T14:41:10Z) - AdapThink: Adaptive Thinking Preferences for Reasoning Language Model [32.47427081297578]
Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models.<n>However, this slow thinking'' paradigm presents a critical challenge to reasoning efficiency.<n>We propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking.
arXiv Detail & Related papers (2025-06-23T02:06:04Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [75.04643265875072]
Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking.<n>Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization.<n>ACPO enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch.
arXiv Detail & Related papers (2025-05-22T07:15:08Z) - Adaptive Thinking via Mode Policy Optimization for Social Language Agents [75.3092060637826]
We propose a framework to improve the adaptive thinking ability of language agents in dynamic social interactions.<n>Our framework advances existing research in three key aspects: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing.
arXiv Detail & Related papers (2025-05-04T15:39:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.