Fugu-MT 論文翻訳(概要): Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

論文の概要: Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

arxiv url: http://arxiv.org/abs/2605.19852v1
Date: Tue, 19 May 2026 13:44:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.378325
Title: Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
Title（参考訳）: ツールは常に有用か? デュアルモードマルチモーダルLLM推論に適応的にツールを呼び出すことを学ぶ
Authors: Qinghe Ma, Zhen Zhao, Yiming Wu, Jian Zhang, Lei Bai, Yinghuan Shi,
Abstract要約: 本稿では,各クエリの特徴に応じてツールを呼び出すかどうかを適応的に決定するモデルであるAutoToolを紹介する。強化学習フレームワーク内では、モード固有の報酬関数を持つ明示的な二重モード推論戦略を設計する。実験により、AutoToolは優れた性能と高い効率を示すことが示された。
参考スコア（独自算出の注目度）: 44.46498720264651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools. We argue that tool usage is not always beneficial, as redundant or inappropriate invocations largely increase reasoning overhead and even mislead model predictions. To address this issue, we introduce AutoTool, a model that adaptively decides whether to invoke tools according to the characteristics of each query. Within a reinforcement learning framework, we design an explicit dual-mode reasoning strategy with mode-specific reward functions to guide the model toward producing accurate responses. Moreover, to prevent premature bias toward a single reasoning mode, AutoTool jointly explores and balances tool-assisted and text-centric reasoning throughout training, and promotes free exploration in later stages. Extensive experiments demonstrate that AutoTool exhibits outstanding performance and high efficiency, yielding a 21.8\% accuracy gain on V* benchmark compared to the base model, and a 44.9\% improvement in efficiency over existing tool-augmented methods on POPE benchmark. Code is available at https://github.com/MQinghe/AutoTool.
Abstract（参考訳）: マルチモーダル大規模言語モデル(MLLM)の推論能力を高めるための有望な方向としてツール拡張推論が登場した。しかし、既存の研究は主に、ツール呼び出しの必要性を無視しながら、モデルをツール呼び出しの実行を可能にすることに焦点を当てている。冗長あるいは不適切な呼び出しによって推論オーバーヘッドが大幅に増加し、モデル予測を誤解させるため、ツールの使用が必ずしも有益であるとは限らない、と私たちは主張しています。この問題に対処するために,各クエリの特徴に応じてツールを呼び出すかどうかを適応的に決定するモデルであるAutoToolを紹介する。強化学習フレームワーク内では,モード固有の報酬関数を持つ明示的な二重モード推論戦略を設計し,モデルが正確な応答を生成するよう誘導する。さらに、単一推論モードに対する未熟なバイアスを防止するために、AutoToolはトレーニング全体を通してツールアシストとテキスト中心の推論を共同で探索し、バランスを取り、後段のフリーな探索を促進する。大規模な実験により、AutoToolは優れた性能と高い効率を示し、V*ベンチマークではベースモデルに比べて21.8\%精度が向上し、POPEベンチマークでは既存のツール拡張手法よりも44.9\%効率が向上した。コードはhttps://github.com/MQinghe/AutoTool.comから入手できる。

論文の概要: Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

関連論文リスト