Fugu-MT 論文翻訳(概要): ES-Merging: Biological MLLM Merging via Embedding Space Signals

論文の概要: ES-Merging: Biological MLLM Merging via Embedding Space Signals

arxiv url: http://arxiv.org/abs/2603.14405v1
Date: Sun, 15 Mar 2026 14:38:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.791607
Title: ES-Merging: Biological MLLM Merging via Embedding Space Signals
Title（参考訳）: ES-Merging:宇宙信号の埋め込みによる生物MLLMの融合
Authors: Wonbin Lee, Dongki Kim, Sung Ju Hwang,
Abstract要約: 埋め込み空間信号からマージ係数を推定する表現対応マージフレームワークを提案する。提案手法は既存のマージ手法よりも優れており,タスク固有の微調整モデルを超えている。
参考スコア（独自算出の注目度）: 52.84455878597969
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Biological multimodal large language models (MLLMs) have emerged as powerful foundation models for scientific discovery. However, existing models are specialized to a single modality, limiting their ability to solve inherently cross-modal scientific problems. While model merging is an efficient method to combine the different modalities into a unified MLLM, existing methods rely on input-agnostic parameter space heuristics that fail to faithfully capture modality specialization. To overcome this limitation, we propose a representation-aware merging framework that estimates merging coefficients from embedding space signals. We first design a probe input that consists of different modality tokens and forward it through each specialized MLLM to obtain layer-wise embedding responses that reflect modality-specific representation changes. We then estimate complementary merging coefficients at two granularities from the embedding space: layer-wise coefficients from coarse-grained signals and element-wise coefficients from fine-grained signals, which are jointly combined for robust coefficient estimation. Experiments on interactive effect prediction benchmarks show that our method outperforms existing merging methods and even surpasses task-specific fine-tuned models, establishing that embedding space signals provide a principled and effective foundation for cross-modal MLLM merging.
Abstract（参考訳）: 生物多モーダル大規模言語モデル(MLLM)は、科学的発見のための強力な基礎モデルとして登場した。しかし、既存のモデルは単一のモダリティに特化しており、本質的にクロスモーダルな科学的問題を解く能力を制限する。モデルマージは、異なるモダリティを統一MLLMに結合する効率的な手法であるが、既存の手法は、モダリティの特殊化を忠実に捉えることができない入力非依存のパラメータ空間ヒューリスティックに依存している。この制限を克服するために、埋め込み空間信号からマージ係数を推定する表現対応マージフレームワークを提案する。まず、異なるモダリティトークンからなるプローブ入力を設計し、各特殊なMLLMを通して転送し、モダリティ固有の表現変化を反映した層単位での埋め込み応答を得る。次に, 埋め込み空間から2つの粒度の相補的マージ係数を推定する: 粗粒度信号の層次係数と細粒度信号の要素次係数である。インタラクティブエフェクト予測ベンチマーク実験により,本手法は既存のマージ手法よりも優れており,タスク固有の微調整モデルを超え,埋め込み空間信号がクロスモーダルMLLMマージの原理的かつ効果的な基礎を提供することを確認した。

論文の概要: ES-Merging: Biological MLLM Merging via Embedding Space Signals

関連論文リスト