Fugu-MT 論文翻訳(概要): Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

論文の概要: Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

arxiv url: http://arxiv.org/abs/2606.02645v1
Date: Sun, 31 May 2026 15:46:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.489449
Title: Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics
Title（参考訳）: ターゲット更新でリニアQ-Learningが安定する可能性-周期的・ソフトなダイナミクス
Authors: Donghwan Lee,
Abstract要約: 本稿では,線形関数近似を用いたQ学習機構の厳密かつ正確に解析する(線形Q学習)。線形Q-ラーニングは一般に収束しないが、明示的なスペクトルおよびステップサイズ条件下では、周期的ハードターゲット更新とソフトターゲット更新が正確なQ-ベルマン解の収束を保証することを証明している。
参考スコア（独自算出の注目度）: 7.8232617281369805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.
Abstract（参考訳）: Q-ラーニングにおける周期的目標更新とアクター批判手法のソフトターゲット更新は、実証的に確立された安定化メカニズムであるが、その正確な理論的説明はまだ不完全である。本稿では, 線形関数近似(線形Q-ラーニング)を用いたこれらのQ-ラーニングのメカニズムを, ベルマン最大値と結果の切換行列列の関節スペクトル半径(JSR)によって誘起される正確な切替線形系(SLS)ダイナミクスを用いて, 厳密かつ正確な解析を行った。線形Q-ラーニングは一般に収束できないが、明示的なスペクトルおよびステップサイズ条件下では、周期的ハードターゲット更新とソフトターゲット更新が正確なQ-ベルマン解の収束を保証することを証明している。主解析は決定論的線形Q-ラーニングであり、ターゲット更新機構が最も透明である。平均再帰のために対応するJSR証明書が確立されると、確率的強化学習設定は、決定論的モードをサンプリングされた確率的モードに置き換え、対応する確率的雑音解析を追加することで処理できる。

論文の概要: Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

関連論文リスト