Fugu-MT 論文翻訳(概要): Switching-Geometry Analysis of Deflated Q-Value Iteration

論文の概要: Switching-Geometry Analysis of Deflated Q-Value Iteration

arxiv url: http://arxiv.org/abs/2605.10811v2
Date: Mon, 18 May 2026 17:13:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 23:51:08.259067
Title: Switching-Geometry Analysis of Deflated Q-Value Iteration
Title（参考訳）: 拡散Q値反復のスイッチング・ジオメトリ解析
Authors: Donghwan Lee,
Abstract要約: 本稿では, 政策最適化問題に対する拡張Q-VIの最初のJSRベースの収束解析について述べる。デフレの利点は、引き起こされた意思決定の問題の変化ではなく、より正確なJSRベースの収束幾何学の記述である。
参考スコア（独自算出の注目度）: 7.8232617281369805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $γ\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $γ$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $γ$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
Abstract（参考訳）: 本稿では、マルコフ決定過程制御におけるランクワンデフレーションQ値反復(Q-VI)の解析のための共同スペクトル半径(JSR)フレームワークを開発する。残差補正に焦点をあてて、スイッチングシステムの幾何学を通して結果のアルゴリズムを解釈し、私たちの知る限り、政策最適化問題に対して、最初のJSRベースのQ-VI収束解析を与える。解析の結果,標準的なQ-VIスイッチングシステムモデルでは,すべての許容サブシステムが全対数ベクトルを不変方向として共有するため,JSRが正確に$γ\in (0,1)$の割引係数を持つことが明らかとなった。この方向を取り除いた商空間に渡すことで、JSRが関連するエラーダイナミクスを制御し、$γ$より厳密に小さいかもしれない、投影された切替システムモデルを得る。したがって、膨らんだ Q-VI は、周囲空間 $γ$-bound よりも、潜在的によりシャープな収束率の特性を持つ。最後に、補正は標準Q-VIのスカラー更新と等価であることを示す。したがって、投影された軌道、すなわち欲求政治列は、同じ点から初期化された標準Q-VIに対して変化しない。デフレの利点は、引き起こされた意思決定の問題の変化ではなく、冗長なオールオンコンポーネントの後に、より正確なJSRベースの収束幾何学の記述が削除されます。

論文の概要: Switching-Geometry Analysis of Deflated Q-Value Iteration

関連論文リスト