Fugu-MT 論文翻訳(概要): Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

論文の概要: Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

arxiv url: http://arxiv.org/abs/2601.04442v1
Date: Wed, 07 Jan 2026 23:05:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:52.949597
Title: Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
Title（参考訳）: Gated Perception-Reasoning Optimization による大規模視線モデルにおける再考
Authors: Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui,
Abstract要約: Gated Perception-Reasoning Optimization (GPRO) は3つの決定経路間で動的に計算をルーティングするメタ推論コントローラである。 GPROは精度と効率を大幅に改善し、最近のスロー思考法よりも優れている。
参考スコア（独自算出の注目度）: 56.59356959631999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision paths at each generation step: a lightweight fast path, a slow perception path for re-examining visual inputs, and a slow reasoning path for internal self-reflection. To learn this distinction, we derive large-scale failure attribution supervision from approximately 790k samples, using teacher models to distinguish perceptual hallucinations from reasoning errors. We then train the controller with multi-objective reinforcement learning to optimize the trade-off between task accuracy and computational cost under uncertainty. Experiments on five benchmarks demonstrate that GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods while generating significantly shorter responses.
Abstract（参考訳）: LVLM(Large Vision-Language Models)は、ステップ・バイ・ステップの合理性を生成するチェーン・オブ・シント機構を通じて、強力な推論能力を示す。しかし、このような遅い考えのアプローチは、単純なクエリであってもモデルが過度に冗長なレスポンスを生成し、テスト時の非効率性や精度の低下につながる、という過度な考えにつながることが多い。以前の作業では、適応推論戦略を通じてこの問題を緩和しようと試みていたが、これらの手法は、視覚的知覚障害(英語版)という根本的なボトルネックをほとんど見落としている。安定した推論は、低レベルの視覚的接地に依存し、推論の誤りは、十分な熟考ではなく、不完全な知覚から生じることが多いと論じる。この制限に対処するために、Gated Perception-Reasoning Optimization (GPRO) を提案する。これは、各生成ステップにおける3つの決定経路間を動的にルーティングするメタ推論コントローラであり、軽量な高速経路、視覚入力を再検査するための遅い知覚経路、内部自己回帰のための遅い推論経路である。そこで本研究では,教師モデルを用いて,約790k検体から大規模障害帰属管理を導出し,視覚幻覚と推論誤差を識別する。次に,多目的強化学習を用いて制御器を訓練し,不確実性を考慮したタスク精度と計算コストのトレードオフを最適化する。 5つのベンチマーク実験により、GPROは精度と効率の両方を大幅に改善し、最近のスロー思考法よりもはるかに短い応答を生成することが示された。

論文の概要: Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

関連論文リスト