Fugu-MT 論文翻訳(概要): Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

論文の概要: Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

arxiv url: http://arxiv.org/abs/2605.07488v1
Date: Fri, 08 May 2026 09:28:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.95782
Title: Efficient Data Selection for Multimodal Models via Incremental Optimization Utility
Title（参考訳）: インクリメンタル最適化によるマルチモーダルモデルの効率的なデータ選択
Authors: Jinhao Jing, Qiannian Zhao, Chao Huang, Zhan Su,
Abstract要約: 本稿では,データ選択をインクリメンタルな最適化ユーティリティランキング問題として再定義するフレームワークであるOne-Step-Train(OST)を提案する。トップ50サブセットを選択することで、OSTはトレーニングコストを43%削減し(トータルタイム消費は17)、強力なLCM-as-a-Judgeベースラインを1.8ポイント上回る。
参考スコア（独自算出の注目度）: 6.698411108146732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational costs and lack of interpretability. To bridge this gap, we propose One-Step-Train (OST), a framework that reformulates data selection as an incremental optimization utility ranking problem. Instead of relying on semantic heuristics, OST estimates the marginal utility of each sample via a simulated single-step update on a lightweight proxy. Experiments on the Qwen series across multimodal mathematical reasoning benchmarks demonstrate that OST achieves Pareto-optimal efficiency. By selecting the top-50 subset, OST reduces training costs by 43% (and total time consumption by 17) while surpassing the strong LLM-as-a-Judge baseline by 1.8 points. Furthermore, under a fixed compute budget, our method using only the top-20 subset achieves a 5.6 point gain over LLM-as-a-Judge, improves upon heuristic scoring baselines like DEITA, and outperforms the Full-SFT baseline by 8.8 points. Notably, while Full-SFT suffers from performance degradation due to noise, our optimization-grounded approach effectively identifies toxic samples, successfully reversing the negative transfer frequently observed in complex reasoning tasks.
Abstract（参考訳）: 大規模マルチモーダルモデル(LMM)のスケーリングは、合成データに固有の品質-量的トレードオフによって制約される。 LLM-as-a-Judgeのような従来のアプローチは、この問題に対処する上での有効性を証明してきたが、計算コストの禁止と解釈可能性の欠如に悩まされている。このギャップを埋めるために,データ選択をインクリメンタルな最適化ユーティリティランキング問題として再構成するフレームワークであるOne-Step-Train (OST)を提案する。セマンティックヒューリスティックスに頼る代わりに、OSTは軽量プロキシのシミュレーションシングルステップ更新を通じて、各サンプルの限界ユーティリティを見積もる。マルチモーダルな数学的推論ベンチマークによるQwen級数の実験は、OSTがパレート最適効率を達成することを示した。トップ50サブセットを選択することで、OSTはトレーニングコストを43%削減し(トータルタイム消費は17)、強力なLCM-as-a-Judgeベースラインを1.8ポイント上回る。さらに, 計算予算の固定化により, 上位20サブセットのみを用いて, LLM-as-a-Judgeよりも5.6ポイント向上し, DEITAのようなヒューリスティックスコアリングベースラインを改良し, フルSFTベースラインを8.8ポイント上回った。特に、Full-SFTはノイズによる性能劣化に悩まされているが、我々の最適化されたアプローチは有毒な試料を効果的に同定し、複雑な推論タスクでよく見られる負の移動を逆転させることに成功した。

論文の概要: Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

関連論文リスト