Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2402.10810v1
- Date: Fri, 16 Feb 2024 16:35:18 GMT
- Title: Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning
- Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang
- Abstract summary: We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
- Score: 132.7040981721302
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We study the Constrained Convex Markov Decision Process (MDP), where the goal
is to minimize a convex functional of the visitation measure, subject to a
convex constraint. Designing algorithms for a constrained convex MDP faces
several challenges, including (1) handling the large state space, (2) managing
the exploration/exploitation tradeoff, and (3) solving the constrained
optimization where the objective and the constraint are both nonlinear
functions of the visitation measure. In this work, we present a model-based
algorithm, Variational Primal-Dual Policy Optimization (VPDPO), in which
Lagrangian and Fenchel duality are implemented to reformulate the original
constrained problem into an unconstrained primal-dual optimization. Moreover,
the primal variables are updated by model-based value iteration following the
principle of Optimism in the Face of Uncertainty (OFU), while the dual
variables are updated by gradient ascent. Moreover, by embedding the visitation
measure into a finite-dimensional space, we can handle large state spaces by
incorporating function approximation. Two notable examples are (1) Kernelized
Nonlinear Regulators and (2) Low-rank MDPs. We prove that with an optimistic
planning oracle, our algorithm achieves sublinear regret and constraint
violation in both cases and can attain the globally optimal policy of the
original constrained problem.
Related papers
- One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based scenarios.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Primal-dual regression approach for Markov decision processes with
general state and action space [0.30458514384586394]
We develop a regression based primal-dual martingale approach for solving finite time MDPs with general state and action space.
As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy.
arXiv Detail & Related papers (2022-10-01T11:48:22Z) - Convergence and sample complexity of natural policy gradient primal-dual
methods for constrained MDPs [24.582720609592464]
We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
arXiv Detail & Related papers (2022-06-06T04:28:04Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Dual Approach to Constrained Markov Decision Processes with Entropy
Regularization [7.483040617090451]
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization.
Our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primality gap and the constraint violation.
arXiv Detail & Related papers (2021-10-17T21:26:40Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.