Fugu-MT 論文翻訳(概要): Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

論文の概要: Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

arxiv url: http://arxiv.org/abs/2510.14853v1
Date: Thu, 16 Oct 2025 16:24:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.948604
Title: Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models
Title（参考訳）: フライで専門家を振り返る:Mixture-of-Expertモデルにおけるオンライン適応性向上のための連続的なレイアウト
Authors: Guinan Su, Yanwu Yang, Li Shen, Lu Yin, Shiwei Liu, Jonas Geiping,
Abstract要約: Mixture-of-Experts (MoE)モデルは、スパース専門家のアクティベーションを通じて効率的なスケーリングを実現するが、デプロイメントの分散シフトによる最適以下のルーティング決定に悩まされることが多い。我々は、外部の監視やデータなしで、テキスト生成中にMoEルーティング決定を継続的に適用するテキストタデータフリーオンラインテストタイムフレームワークを提案する。
参考スコア（独自算出の注目度）: 52.502867924372275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) models achieve efficient scaling through sparse expert activation, but often suffer from suboptimal routing decisions due to distribution shifts in deployment. While existing test-time adaptation methods could potentially address these issues, they primarily focus on dense models and require access to external data, limiting their practical applicability to MoE architectures. However, we find that, instead of relying on reference data, we can optimize MoE expert selection on-the-fly based only on input context. As such, we propose \textit{a data-free, online test-time framework} that continuously adapts MoE routing decisions during text generation without external supervision or data. Our method cycles between two phases: During the prefill stage, and later in regular intervals, we optimize the routing decisions of the model using self-supervision based on the already generated sequence. Then, we generate text as normal, maintaining the modified router until the next adaption. We implement this through lightweight additive vectors that only update router logits in selected layers, maintaining computational efficiency while preventing over-adaptation. The experimental results show consistent performance gains on challenging reasoning tasks while maintaining robustness to context shifts. For example, our method achieves a 5.5\% improvement on HumanEval with OLMoE. Furthermore, owing to its plug-and-play property, our method naturally complements existing test-time scaling techniques, e.g., achieving 6\% average gains when incorporated with self-consistency on DeepSeek-V2-Lite.
Abstract（参考訳）: Mixture-of-Experts (MoE)モデルは、スパース専門家のアクティベーションを通じて効率的なスケーリングを実現するが、デプロイメントの分散シフトによる最適以下のルーティング決定に悩まされることが多い。既存のテスト時間適応手法はこれらの問題に対処する可能性があるが、主に密集したモデルに焦点をあて、外部データへのアクセスを必要とし、実際のMoEアーキテクチャの適用性を制限している。しかし、参照データに頼る代わりに、入力コンテキストのみに基づいてMoE専門家の選択をオンザフライで最適化できることがわかった。そこで本研究では、外部の監視やデータなしにテキスト生成中にMoEルーティング決定を継続的に適用する、データフリーのオンラインテストタイムフレームワークである「textit{a」を提案する。提案手法は2つのフェーズの間を循環する: プリフィルの段階で、そしてその後定期的に、既に生成されたシーケンスに基づいて自己スーパービジョンを用いてモデルのルーティング決定を最適化する。そして、修正ルータを次の適応まで維持し、通常のテキストを生成する。我々は,選択した層におけるルータロジットのみを更新し,過度適応を防止しながら計算効率を向上する軽量な加算ベクトルを用いてこれを実装した。実験結果から,コンテキストシフトに対する堅牢性を維持しながら,課題推論タスクに対して一貫した性能向上が得られた。例えば,OLMoEを用いたHumanEvalの5.5倍の精度向上を実現している。さらに,そのプラグアンドプレイ特性から,DeepSeek-V2-Lite上での自己整合性を組み込んだ場合,既存のテスト時間スケーリング技術,例えば平均利得を6倍に向上させる手法を自然に補完する。

論文の概要: Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

関連論文リスト