Fugu-MT 論文翻訳(概要): Rethinking Plasticity in Deep Reinforcement Learning

論文の概要: Rethinking Plasticity in Deep Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.21173v1
Date: Sun, 22 Mar 2026 11:27:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.274799
Title: Rethinking Plasticity in Deep Reinforcement Learning
Title（参考訳）: 深層強化学習における塑性の再考
Authors: Zhiqiang He,
Abstract要約: 本稿では, 深部強化学習(RL)における塑性損失の基本的なメカニズムについて検討する。本稿では,従来のタスクの最適点が,新しいタスクの局所的最適度に乏しいため,可塑性損失が発生することを示唆する最適化・中心塑性(OCP)仮説を提案する。複雑なドメインにおけるネットワーク可塑性の理解と復元のための厳密な最適化に基づくフレームワークを提供する。
参考スコア（独自算出の注目度）: 3.18807491942654
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper investigates the fundamental mechanisms driving plasticity loss in deep reinforcement learning (RL), a critical challenge where neural networks lose their ability to adapt to non-stationary environments. While existing research often relies on descriptive metrics like dormant neurons or effective rank, these summaries fail to explain the underlying optimization dynamics. We propose the Optimization-Centric Plasticity (OCP) hypothesis, which posits that plasticity loss arises because optimal points from previous tasks become poor local optima for new tasks, trapping parameters during task transitions and hindering subsequent learning. We theoretically establish the equivalence between neuron dormancy and zero-gradient states, demonstrating that the absence of gradient signals is the primary driver of dormancy. Our experiments reveal that plasticity loss is highly task-specific; notably, networks with high dormancy rates in one task can achieve performance parity with randomly initialized networks when switched to a significantly different task, suggesting that the network's capacity remains intact but is inhibited by the specific optimization landscape. Furthermore, our hypothesis elucidates why parameter constraints mitigate plasticity loss by preventing deep entrenchment in local optima. Validated across diverse non-stationary scenarios, our findings provide a rigorous optimization-based framework for understanding and restoring network plasticity in complex RL domains.
Abstract（参考訳）: 本稿では,ニューラルネットワークが非定常環境に適応する能力を失う重要な課題である深層強化学習(RL)における可塑性損失の基本的なメカニズムについて検討する。既存の研究はしばしば休眠ニューロンや効果的なランクのような記述的な指標に依存しているが、これらの要約は基礎となる最適化のダイナミクスを説明できない。本稿では,従来のタスクの最適点が,タスク遷移時のパラメータのトラップや,その後の学習の妨げとなるため,可塑性損失が生じるという最適化中心塑性仮説を提案する。理論的には、ニューロン休眠状態とゼロ段階状態の等価性を確立し、勾配信号の欠如が休眠の第一の要因であることを示す。実験の結果, 可塑性損失はタスクに特化していることが明らかとなった。特に, 1つのタスクで高い休眠率のネットワークは, 異なるタスクに切り替えた場合, ランダムに初期化ネットワークと同等の性能を達成でき, ネットワークの容量は維持されるが, 特定の最適化環境によって抑制される可能性が示唆された。さらに, パラメータ制約が局所最適の深部侵入を防止し, 可塑性損失を緩和する理由を考察した。様々な非定常シナリオにまたがって検証され、複雑なRLドメインにおけるネットワーク可塑性の理解と復元のための厳密な最適化ベースのフレームワークを提供する。

論文の概要: Rethinking Plasticity in Deep Reinforcement Learning

関連論文リスト