Fugu-MT 論文翻訳(概要): Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

論文の概要: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

arxiv url: http://arxiv.org/abs/2110.10351v1
Date: Wed, 20 Oct 2021 02:57:21 GMT
ステータス: 翻訳完了
システム内更新日: 2021-10-24 02:19:13.899596
Title: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
Title（参考訳）: 制約付きマルコフ決定過程の高速アルゴリズムとシャープ解析
Authors: Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang and Guanghui Lan
Abstract要約: 制約付き意思決定プロセス (CMDP) の問題点について検討し, エージェントは, 複数の制約を条件として, 期待される累積割引報酬を最大化することを目的とする。新しいユーティリティ・デュアル凸法は、正規化ポリシー、双対正則化、ネステロフの勾配降下双対という3つの要素の新たな統合によって提案される。これは、凸制約を受ける全ての複雑性最適化に対して、非凸CMDP問題が$mathcal O (1/epsilon)$の低い境界に達する最初の実演である。
参考スコア（独自算出の注目度）: 56.55075925645864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov's accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural,paternain2019constrained}. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of $\mathcal O(1/\epsilon)$ for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.
Abstract（参考訳）: 制約付きマルコフ決定プロセス(CMDP)の問題点を考察し、エージェントは、そのユーティリティやコストに対する複数の制約により、期待される累積割引報酬を最大化する。エントロピー正則化ポリシーオプティマイザ, 双対変数正則化器, ネステロフ加速勾配降下双最適化器の3成分を新たに統合し, より高速な収束を達成するために重要な手法を提案する。提案手法の有限時間誤差境界を特徴付ける。非凹型制約を対象とする非凹型目標の挑戦にもかかわらず、提案されたアプローチは、最適性ギャップと制約違反の観点から、$\tilde{\mathcal o}(1/\epsilon)$の複雑性で大域的最適化に収束することを示し、既存の原始双対アプローチの複雑さを$\mathcal o(1/\epsilon)$ \citep{ding2020natural,paternain2019constrained}の係数によって改善する。これは、非凸cmdp問題が凸制約を受ける凸最適化に対する$\mathcal o(1/\epsilon)$の複雑性下限を達成することができる最初の例である。我々の原始双対アプローチと非漸近解析は、使用するRLオプティマイザに非依存であり、実用的な応用にはより柔軟である。より一般に、本手法は、既存の制約付き凸最適化のための加速度法が適用できない勾配支配条件のようなジオメトリを利用して、ゼロ双対性ギャップで制約付き非凸最適化を確実に加速する最初のアルゴリズムとしても機能する。

論文の概要: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

関連論文リスト