Fugu-MT 論文翻訳(概要): Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?

論文の概要: Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?

arxiv url: http://arxiv.org/abs/2510.14387v1
Date: Thu, 16 Oct 2025 07:38:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.762023
Title: Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?
Title（参考訳）: MLLMはフリーランチとしてLSMから数学を吸収できるか?
Authors: Yijie Hu, Zihao Zhou, Kaizhu Huang, Xiaowei Huang, Qiufeng Wang,
Abstract要約: 数学推論は、大規模言語モデル(LLM)において重要な能力である。本稿では,まずMLLMとMath LLMの両方の推論関連パラメータを識別し,それらをMLLMのサブ空間に投影するIP-Mergingを提案する。 IP-Mergingはパラメータを直接調整するため、チューニング不要のアプローチである。
参考スコア（独自算出の注目度）: 45.99128235364487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Math reasoning has been one crucial ability of large language models (LLMs), where significant advancements have been achieved in recent years. However, most efforts focus on LLMs by curating high-quality annotation data and intricate training (or inference) paradigms, while the math reasoning performance of multi-modal LLMs (MLLMs) remains lagging behind. Since the MLLM typically consists of an LLM and a vision block, we wonder: Can MLLMs directly absorb math reasoning abilities from off-the-shelf math LLMs without tuning? Recent model-merging approaches may offer insights into this question. However, they overlook the alignment between the MLLM and LLM, where we find that there is a large gap between their parameter spaces, resulting in lower performance. Our empirical evidence reveals two key factors behind this issue: the identification of crucial reasoning-associated layers in the model and the mitigation of the gaps in parameter space. Based on the empirical insights, we propose IP-Merging that first identifies the reasoning-associated parameters in both MLLM and Math LLM, then projects them into the subspace of MLLM, aiming to maintain the alignment, and finally merges parameters in this subspace. IP-Merging is a tuning-free approach since parameters are directly adjusted. Extensive experiments demonstrate that our IP-Merging method can enhance the math reasoning ability of MLLMs directly from Math LLMs without compromising their other capabilities.
Abstract（参考訳）: 数学推論は大規模言語モデル(LLM)において重要な能力であり、近年大きな進歩を遂げている。しかし,MLLM(Multi-modal LLMs)の数学推論性能は遅れているものの,高品質なアノテーションデータや複雑なトレーニング(あるいは推論)パラダイムをキュレートすることでLLMに重点を置いている。 MLLMは、通常、LLMとビジョンブロックで構成されているので、MLLMは、既成の数学 LLM の数学推論能力を、チューニングなしで直接吸収できるのだろうか? 最近のモデルマージアプローチは、この問題に対する洞察を与えるかもしれない。しかし、MLLM と LLM の整合性を見落とし、パラメータ空間の間に大きなギャップがあることがわかり、結果として性能が低下する。我々の経験的証拠は、モデルにおける決定的推論関連レイヤの同定とパラメータ空間のギャップの緩和という、この問題の背景にある2つの重要な要因を明らかにしている。実験的な知見に基づいて,まずMLLMとMath LLMの推論関連パラメータを識別し,それらをMLLMのサブスペースに投影し,アライメントの維持を目標とし,最終的にサブスペース内のパラメータをマージするIP-Mergingを提案する。 IP-Mergingはパラメータを直接調整するため、チューニング不要のアプローチである。我々のIP-Merging法は,Math LLMから直接MLLMの算数推論能力を向上できることを示した。

論文の概要: Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?

関連論文リスト