Algorithm for Constrained Markov Decision Process with Linear
Convergence
- URL: http://arxiv.org/abs/2206.01666v1
- Date: Fri, 3 Jun 2022 16:26:38 GMT
- Title: Algorithm for Constrained Markov Decision Process with Linear
Convergence
- Authors: Egor Gladin, Maksim Lavrik-Karmazin, Karina Zainullina, Varvara
Rudenko, Alexander Gasnikov, Martin Tak\'a\v{c}
- Abstract summary: An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
- Score: 55.41644538483948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of constrained Markov decision process is considered. An agent
aims to maximize the expected accumulated discounted reward subject to multiple
constraints on its costs (the number of constraints is relatively small). A new
dual approach is proposed with the integration of two ingredients: entropy
regularized policy optimizer and Vaidya's dual optimizer, both of which are
critical to achieve faster convergence. The finite-time error bound of the
proposed approach is provided. Despite the challenge of the nonconcave
objective subject to nonconcave constraints, the proposed approach is shown to
converge (with linear rate) to the global optimum. The complexity expressed in
terms of the optimality gap and the constraint violation significantly improves
upon the existing primal-dual approaches.
Related papers
- The inexact power augmented Lagrangian method for constrained nonconvex optimization [44.516958213972885]
This work introduces an unconventional augmented Lagrangian term, where the augmenting term is a Euclidean norm raised to a power.
We show that using lower powers for augmenting term to faster rate, albeit with a slower decrease in residual.
Our results further show that using lower powers for augmenting term to faster rate, albeit with a slower decrease in residual.
arXiv Detail & Related papers (2024-10-26T11:31:56Z) - A Double Tracking Method for Optimization with Decentralized Generalized Orthogonality Constraints [4.6796315389639815]
Decentralized optimization problems are unsolvable in the presence of distributed constraints.
We introduce a novel algorithm that tracks the gradient of the objective function and the Jacobian of the constraint mapping simultaneously.
We present numerical results on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-09-08T06:57:35Z) - Two-Step QAOA: Enhancing Quantum Optimization by Decomposing One-Hot Constraints in QUBO Formulations [0.0]
We propose a simple approach, the Two-Step QAOA, which aims to improve the effectiveness of QAOA.
By identifying and separating the problem into two stages, we transform soft constraints into hard constraints.
arXiv Detail & Related papers (2024-08-09T23:38:28Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Dual Approach to Constrained Markov Decision Processes with Entropy
Regularization [7.483040617090451]
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization.
Our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primality gap and the constraint violation.
arXiv Detail & Related papers (2021-10-17T21:26:40Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.