Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
- URL: http://arxiv.org/abs/2601.04442v1
- Date: Wed, 07 Jan 2026 23:05:17 GMT
- Title: Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
- Authors: Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui,
- Abstract summary: Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
- Score: 56.59356959631999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision paths at each generation step: a lightweight fast path, a slow perception path for re-examining visual inputs, and a slow reasoning path for internal self-reflection. To learn this distinction, we derive large-scale failure attribution supervision from approximately 790k samples, using teacher models to distinguish perceptual hallucinations from reasoning errors. We then train the controller with multi-objective reinforcement learning to optimize the trade-off between task accuracy and computational cost under uncertainty. Experiments on five benchmarks demonstrate that GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods while generating significantly shorter responses.
Related papers
- Seeing Farther and Smarter: Value-Guided Multi-Path Reflection for VLM Policy Optimization [41.15414881730464]
Vision-Language Models (VLMs) offer a general perceive-reason-act framework for this goal.<n>Previous approaches rely on inefficient and often inaccurate implicit learning of state-values from noisy foresight predictions.<n>We propose a novel test-time computation framework that decouples state evaluation from action generation.
arXiv Detail & Related papers (2026-02-22T22:53:16Z) - Mitigating Cognitive Inertia in Large Reasoning Models via Latent Spike Steering [12.332146893333949]
Large Reasoning Models (LRMs) have achieved remarkable performance by scaling test-time compute.<n>LRMs frequently suffer from Cognitive Inertia, a failure pattern manifesting as either overthinking (inertia of motion) or rigidity (inertia of direction)
arXiv Detail & Related papers (2026-01-30T02:47:12Z) - Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time [22.9491443902816]
We study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors.<n>We propose CREST, a training-free method for Cognitive REasoning Steering at Test-time.<n>CREST adaptively suppresses unproductive reasoning behaviors, yielding both higher accuracy and lower computational cost.
arXiv Detail & Related papers (2025-12-31T02:46:04Z) - Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction [14.164508061248775]
Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency.<n>We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers.<n> Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost.
arXiv Detail & Related papers (2025-12-21T05:35:07Z) - Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing [70.35701681177655]
Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models.<n>We introduce four efficient strategies to achieve head-tail re-balancing during the exploration-and-learning self-improvement process.<n>Our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.
arXiv Detail & Related papers (2025-10-30T13:26:58Z) - Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs [52.663816303997194]
A key factor influencing answer quality is the length of the thinking stage.<n>This paper explores and exploits the mechanisms by which LLMs understand and regulate the length of their reasoning.<n>Our results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency.
arXiv Detail & Related papers (2025-06-08T17:54:33Z) - Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models [130.5487886246353]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z) - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z) - Let LRMs Break Free from Overthinking via Self-Braking Tuning [68.93713497579853]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.