Fugu-MT 論文翻訳(概要): The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

論文の概要: The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

arxiv url: http://arxiv.org/abs/2603.04415v1
Date: Wed, 04 Feb 2026 04:13:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.192553
Title: The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning
Title（参考訳）: 思考境界:デュアルチューニングによるマルチモーダルタスクの推論適合性の定量化
Authors: Ruobing Zheng, Tianqi Li, Jianing Li, Qingpei Guo, Yi Yuan, Jingdong Chen,
Abstract要約: 提案するDual Tuningは、推論が目標タスクに対して肯定的な利得をもたらすかどうかを評価するためのフレームワークである。多様なマルチモーダルタスクにおける推論学習の適性を評価するために,「シンキング境界」を確立した。我々の研究は、適切なデータとトレーニング戦略を特定するための実践的なガイダンスを提供する「すべての理由」パラダイムに挑戦する。
参考スコア（独自算出の注目度）: 46.61419294791218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reasoning-enhanced Large Language Models (LLMs) have demonstrated remarkable advances in complex tasks such as mathematics and coding, their effectiveness across universal multimodal scenarios remains uncertain. The trend of releasing parallel "Instruct" and "Thinking" models by leading developers serves merely as a resource-intensive workaround, stemming from the lack of a criterion for determining when reasoning is truly beneficial. In this paper, we propose Dual Tuning, a framework designed to assess whether reasoning yields positive gains for target tasks under given base models and datasets. By jointly fine-tuning on paired Chain-of-Thought (CoT) and Direct-Answer (DA) data under controlled prompts, we systematically quantify and compare the gains of both training modes using the proposed metrics, and establish the "Thinking Boundary" to evaluate the suitability of reasoning training across diverse multimodal tasks, including spatial, mathematical, and multi-disciplinary domains. We further explore the impact of reinforcement training and thinking patterns on reasoning suitability, and validate whether the "Thinking Boundary" can guide data refinement. Our findings challenge the "reasoning-for-all" paradigm, providing practical guidance for identifying appropriate data and training strategies, and motivating the development of resource-efficient, adaptive auto-think systems.
Abstract（参考訳）: 推論強化大言語モデル (LLMs) は数学やコーディングといった複雑なタスクにおいて顕著な進歩を見せているが、普遍的マルチモーダルシナリオにおけるそれらの有効性はいまだに不確実である。リードディベロッパによるパラレルな"インストラクション"と"シンキング"モデルのリリース傾向は、推論が真に有益であるかどうかを決定する基準が欠如していることから、単にリソース集約的な回避策として機能するだけである。本稿では,与えられたベースモデルとデータセットの下での目標タスクに対して,推論が正の利得をもたらすかどうかを評価するためのフレームワークであるDual Tuningを提案する。制御されたプロンプト下での2つのChain-of-Thought(CoT)データとDirect-Answer(DA)データを協調的に微調整することにより、提案手法を用いて両方のトレーニングモードの利得を体系的に定量化し比較し、「シンキング境界」を確立し、空間、数学的、多分野の様々なマルチモーダルタスクにおける推論トレーニングの適合性を評価する。さらに、強化訓練と思考パターンが推論適性に及ぼす影響について検討し、「シンキング境界」がデータ洗練を導くことができるかどうかを検証する。我々の研究は、適切なデータとトレーニング戦略を特定するための実践的なガイダンスを提供し、資源効率の良い適応型自動思考システムの開発を動機付ける「すべてのための推論」パラダイムに挑戦する。

論文の概要: The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

関連論文リスト