Fugu-MT 論文翻訳(概要): Guiding Mixture-of-Experts with Temporal Multimodal Interactions

論文の概要: Guiding Mixture-of-Experts with Temporal Multimodal Interactions

arxiv url: http://arxiv.org/abs/2509.25678v2
Date: Wed, 08 Oct 2025 04:21:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 14:21:18.174376
Title: Guiding Mixture-of-Experts with Temporal Multimodal Interactions
Title（参考訳）: 時間的マルチモーダル相互作用を用いた混合実験の導出
Authors: Xing Han, Hsing-Huan Chung, Joydeep Ghosh, Paul Pu Liang, Suchi Saria,
Abstract要約: 本稿では,時間的相互作用を定量化してMoEルーティングをガイドする新しいフレームワークを提案する。マルチモーダルなインタラクション対応ルータは、インタラクションの性質に基づいて、トークンを専門家にディスパッチする方法を学ぶ。
参考スコア（独自算出の注目度）: 30.728093182390364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) architectures have become pivotal for large-scale multimodal models. However, their routing mechanisms typically overlook the informative, time-varying interaction dynamics between modalities. This limitation hinders expert specialization, as the model cannot explicitly leverage intrinsic modality relationships for effective reasoning. To address this, we propose a novel framework that guides MoE routing using quantified temporal interaction. A multimodal interaction-aware router learns to dispatch tokens to experts based on the nature of their interactions. This dynamic routing encourages experts to acquire generalizable interaction-processing skills rather than merely learning task-specific features. Our framework builds on a new formulation of temporal multimodal interaction dynamics, which are used to guide expert routing. We first demonstrate that these temporal multimodal interactions reveal meaningful patterns across applications, and then show how they can be leveraged to improve both the design and performance of MoE-based models. Comprehensive experiments on challenging multimodal benchmarks validate our approach, demonstrating both enhanced performance and improved interpretability.
Abstract（参考訳）: Mixture-of-Experts (MoE)アーキテクチャは、大規模なマルチモーダルモデルにおいて重要なものとなっている。しかしながら、それらのルーティング機構は典型的には、モダリティ間の情報的、時間的変化の相互作用のダイナミクスを見落としている。この制限は、モデルが効果的な推論のために本質的なモダリティ関係を明示的に活用できないため、専門家の特殊化を妨げる。そこで本稿では,時間的相互作用を定量化してMoEルーティングをガイドする新しいフレームワークを提案する。マルチモーダルなインタラクション対応ルータは、インタラクションの性質に基づいて、トークンを専門家にディスパッチする方法を学ぶ。この動的ルーティングは、単にタスク固有の特徴を学習するのではなく、専門家に汎用的なインタラクション処理スキルの習得を促す。本フレームワークは,時間的多モーダル相互作用のダイナミクスの新たな定式化を基盤として,エキスパートルーティングのガイドに使用される。まず、これらの時間的マルチモーダル相互作用がアプリケーション間で有意義なパターンを明らかにすることを実証し、MoEモデルの設計と性能を改善するためにどのように活用できるかを示す。課題のあるマルチモーダルベンチマークに関する総合的な実験は、我々のアプローチを検証し、性能の向上と解釈可能性の向上の両方を実証した。

論文の概要: Guiding Mixture-of-Experts with Temporal Multimodal Interactions

関連論文リスト