Fugu-MT 論文翻訳(概要): FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

論文の概要: FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

arxiv url: http://arxiv.org/abs/2605.09355v1
Date: Sun, 10 May 2026 06:09:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.208498
Title: FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning
Title（参考訳）: FLAME: 連続マルチモーダルマルチタスク学習のための適応的ミックス・オブ・エクササイズ
Authors: Xing Han, Shravan Chaudhari, Tanvi Ranade, Rama Chellappa, Suchi Saria,
Abstract要約: 複数のドメインにまたがる実世界のモデル展開には、2つの補完的な体制の下で運用するマルチモーダルモデルが必要である。フレキシブルなモダリティの組み合わせにまたがるマルチタスク事前学習と連続学習のためのスケーラブルなMoEフレームワークを提案する。
参考スコア（独自算出の注目度）: 31.686140342132745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world model deployment across multiple domains requires multimodal models to operate under two complementary regimes: (1) multi-task pretraining, tasks are co-available at design time where related tasks could borrow representational strength from one another, (2) continual adaptation, in which new tasks emerge after deployment with previously unseen modality combinations. However, neither regime alone suffices: the pretraining task set is never exhaustive, while bypassing joint training forfeits the transfer gains and efficiency among co-trainable tasks. Sparse Mixture-of-Experts (MoE) is a natural fit for this dual requirement: sparse activation enables modular capacity expansion as new tasks arrive, while routing decouples modality-level computation from task-level composition. In this work, we propose a scalable MoE framework for multitask pretraining and continual learning across flexible modality combinations. The framework is designed to support training on multimodal tasks with diverse modality configurations by leveraging modality-specific routers that process tokens from each modality across tasks. Furthermore, it enables continual learning over sequential multimodal tasks within a fixed-capacity MoE by compressing accumulated expert knowledge into low-rank memory subspaces, while expanding only the lightweight routers. We validate the effectiveness of our method on multiple healthcare multimodal benchmarks. It demonstrates competitive multitask pretraining performance while alleviating catastrophic forgetting and improving parameter efficiency.
Abstract（参考訳）: 複数のドメインにまたがる実世界のモデル展開には,(1)マルチタスク事前訓練,2) 関連タスクが互いに表現力を借りる設計時にタスクが共用可能であること,(2) 継続適応,2) 以前に見つからなかったモダリティの組み合わせで展開後に新しいタスクが出現すること,の2つの補完的な体制の下で動作するために,マルチモーダルモデルが必要である。しかし、どちらの体制も十分ではない。事前訓練タスクセットは、決して徹底的ではなく、共同トレーニングをバイパスすることで、共同訓練可能なタスク間の転送の利得と効率が低下する。 Sparse Mixture-of-Experts (MoE)は、この2つの要件に自然に適合する: スパースアクティベーションは、新しいタスクが到着するとモジュール容量の拡張を可能にし、ルーティングはタスクレベルのコンポジションからモダリティレベルの計算を分離する。本研究では,マルチタスク事前学習と連続学習のためのスケーラブルなMoEフレームワークを提案する。このフレームワークは、タスク間の各モダリティからトークンを処理するモダリティ固有のルータを活用することで、様々なモダリティ構成によるマルチモーダルタスクのトレーニングをサポートするように設計されている。さらに、蓄積した専門知識を低ランクメモリサブスペースに圧縮し、軽量ルータのみを拡張することにより、固定容量 MoE 内のシーケンシャルなマルチモーダルタスクに対する連続的な学習を可能にする。複数の医療マルチモーダルベンチマークにおいて,本手法の有効性を検証した。これは、破滅的な忘れ込みを緩和し、パラメータ効率を向上しながら、競争力のあるマルチタスク事前訓練性能を示す。

論文の概要: FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

関連論文リスト