Fugu-MT 論文翻訳(概要): SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

論文の概要: SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

arxiv url: http://arxiv.org/abs/2605.04712v2
Date: Fri, 08 May 2026 12:53:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 16:31:22.919773
Title: SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
Title（参考訳）: SPHERE:深層強化学習のための混合実験におけるスペクトル可塑性損失の軽減
Authors: Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li,
Abstract要約: Mixture-of-Experts (MoE)ネットワークは、スケーリング法則を有効にし、多様なスキルの学習を容易にするために報告されている。我々は、スペクトル可塑性の損失として、MoEポリシーの可塑性損失を定式化する。我々は、スペクトル可塑性の喪失を緩和するMoEベースのポリシーに適した実用的なParsevalペナルティであるSPHEREを紹介する。
参考スコア（独自算出の注目度）: 9.96668881329259
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.
Abstract（参考訳）: 深層強化学習(DRL)では、エージェントは経験の流れから訓練される。連続的な学習環境では、そのようなエージェントは可塑性損失に悩まされ、新しい経験から新しいスキルを学ぶ能力は、トレーニング中に低下する。近年,Mixture-of-Experts(MoE)ネットワークは,スケーリング法則の実現と多様なスキルの習得を促進するために報告されている。しかし、連続的な強化学習環境では、学習が進むにつれてその性能は低下し、可塑性が失われることを示す。これを解決するために、ニューラル・タンジェント・カーネル(NTK)理論に基づいて、スペクトル可塑性の損失としてMoEポリシーの可塑性損失を定式化する。次に、各専門家の特徴行列の観点から表現可能なスペクトル可塑性の抽出可能なプロキシを導出する。このプロキシを活用することで、スペクトル可塑性の喪失を軽減するMoEベースのポリシーに適した実用的なParsevalペナルティであるSPHEREを導入する。 MetaWorldとHumanoidBenchでは、SPHEREはトレーニングを通して高いスペクトル可塑性を維持しながら、非正規化したMoEベースラインに対して連続RLでの平均成功率を133%、50%向上させる。

論文の概要: SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

関連論文リスト