Fugu-MT 論文翻訳(概要): Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth

論文の概要: Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth

arxiv url: http://arxiv.org/abs/2603.22689v1
Date: Tue, 24 Mar 2026 01:29:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.237612
Title: Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
Title（参考訳）: 360°:深度を超えるMLLMの幅中心推論能力の評価
Authors: Mingrui Chen, Hexiong Yang, Haogeng Liu, Huaibo Huang, Ran He,
Abstract要約: 推論幅は、広範囲な試行錯誤探索や多重制約最適化のためのモデルの能力に焦点を当てる。難易度、質問タイプ、必要なスキルにまたがる12のモデルファミリーを評価した。その結果、現在のモデルでは、一般的なVQAタスクや常識的なVQAタスクに強いパフォーマンスを示す一方で、深いシーケンシャルな思考連鎖と広い探索探索の組み合わせに苦慮していることが明らかとなった。
参考スコア（独自算出の注目度）: 37.75493687006809
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present a holistic multimodal benchmark that evaluates the reasoning capabilities of MLLMs with an explicit focus on reasoning width, a complementary dimension to the more commonly studied reasoning depth. Specifically, reasoning depth measures the model's ability to carry out long-chain, sequential reasoning in which each step is tightly and rigorously linked to the next. Reasoning width tends to focus more on the model's capacity for broad trial-and-error search or multi-constrained optimization: it must systematically traverse many possible and parallelized reasoning paths, apply diverse constraints to prune unpromising branches, and identify valid solution routes for efficient iteration or backtracking. To achieve it, we carefully curate 1200+ high-quality multimodal cases spanning heterogeneous domains, and propose a fine-grained tree-of-thought evaluation protocol that jointly quantifies reasoning width and depth. We evaluate 12 major model families (over 30 advanced MLLMs) across difficulty tiers, question types, and required skills. Results show that while current models exhibit strong performance on general or common-sense VQA tasks, they still struggle to combine deep sequential thought chains with wide exploratory search to perform genuine insight-based reasoning. Finally, we analyze characteristic failure modes to provide possible directions for building MLLMs that reason not only deeper but also wider.
Abstract（参考訳）: 本稿では,より一般的に研究されている推論深さの相補的次元である推論幅に着目し,MLLMの推論能力を評価するための総合的マルチモーダルベンチマークを提案する。具体的には、推論深度は、各ステップが厳密かつ厳密に次のステップにリンクされたシーケンシャルな推論であるロングチェーンを実行するモデルの能力を測定する。推論幅は、広範囲な試行錯誤探索や多重制約最適化のためのモデルの能力に注目する傾向があり、多くの可能な並列化推論経路を体系的にトラバースし、未成熟の枝に様々な制約を適用し、効率的な反復やバックトラックのための有効な解経路を特定する必要がある。そこで我々は,不均質ドメインにまたがる1200以上の高品質なマルチモーダルケースを慎重にキュレートし,推論の幅と深さを共同で定量化する,きめ細かいツリー・オブ・プリート評価プロトコルを提案する。難易度,質問タイプ,必要なスキルの12種類の主要モデルファミリー(30以上の高度MLLM)を評価した。その結果、現在のモデルでは、一般的なVQAタスクや常識的なVQAタスクに強いパフォーマンスを示す一方で、深いシーケンシャルな思考連鎖と広い探索探索とを組み合わせて、真の洞察に基づく推論を行うのに苦慮していることがわかった。最後に,より深いだけでなく広い範囲でMLLMを構築するために,特徴的障害モードを分析した。

論文の概要: Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth

関連論文リスト