Fugu-MT 論文翻訳(概要): Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

論文の概要: Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

arxiv url: http://arxiv.org/abs/2603.14493v1
Date: Sun, 15 Mar 2026 17:16:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.847681
Title: Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
Title（参考訳）: 忘れることなく微調整するMLLMは、想像より簡単
Authors: He Li, Yuhui Zhang, Xiaohan Wang, Kaifeng Lyu, Serena Yeung-Levy,
Abstract要約: 分布内および分布外画像およびテキスト入力のモデル性能を評価するための2x2実験フレームワークを設計する。その結果、トレーニング可能なパラメータの数を制限したり、低学習率を採用するなど、適切な正規化が、アウト・オブ・ディストリビューション・イメージを扱う際の忘れを効果的に防止できることが示唆された。我々は、このことをタスク固有のオーバーフィッティングとみなし、データハイブリッドトレーニング戦略を導入することでこの問題に対処する。
参考スコア（独自算出の注目度）: 72.59321247529975
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The paper demonstrate that simple adjustments of the fine-tuning recipes of multimodal large language models (MLLM) are sufficient to mitigate catastrophic forgetting. On visual question answering, we design a 2x2 experimental framework to assess model performance across in-distribution and out-of-distribution image and text inputs. Our results show that appropriate regularization, such as constraining the number of trainable parameters or adopting a low learning rate, effectively prevents forgetting when dealing with out-of-distribution images. However, we uncover a distinct form of forgetting in settings with in-distribution images and out-of-distribution text. We attribute this forgetting as task-specific overfitting and address this issue by introducing a data-hybrid training strategy that combines datasets and tasks. Finally, we demonstrate that this approach naturally extends to continual learning, outperforming existing methods with complex auxiliary mechanisms. In general, our findings challenge the prevailing assumptions by highlighting the inherent robustness of MLLMs and providing practical guidelines for adapting them while preserving their general capabilities.
Abstract（参考訳）: 本稿では,マルチモーダル大規模言語モデル(MLLM)の微調整法を簡易に調整することで,破滅的な忘れを軽減できることを実証する。視覚的質問応答では、分布内および分布外画像およびテキスト入力間のモデル性能を評価するための2x2実験フレームワークを設計する。その結果、トレーニング可能なパラメータの数を制限したり、低学習率を採用するなど、適切な正規化が、アウト・オブ・ディストリビューション・イメージを扱う際の忘れを効果的に防止できることが示唆された。しかし,非分配画像とアウト・オブ・ディストリビューションテキストを用いて,異なる形態の忘れ方を明らかにした。私たちはこのことをタスク固有のオーバーフィッティングとみなし、データセットとタスクを組み合わせたデータハイブリッドトレーニング戦略を導入することでこの問題に対処しています。最後に,本手法が継続的学習に自然に拡張され,複雑な補助機構を持つ既存手法よりも優れていることを示す。一般論として,MLLMの本質的な堅牢性を強調し,汎用性を保ちながら適応するための実践的ガイドラインを提供することにより,一般的な仮定に挑戦する。

論文の概要: Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

関連論文リスト