Density Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2106.12764v1
- Date: Thu, 24 Jun 2021 04:22:03 GMT
- Title: Density Constrained Reinforcement Learning
- Authors: Zengyi Qin, Yuxiao Chen, Chuchu Fan
- Abstract summary: We study constrained reinforcement learning from a novel perspective by setting constraints directly on state density functions.
We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally.
We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect.
- Score: 9.23225507471139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study constrained reinforcement learning (CRL) from a novel perspective by
setting constraints directly on state density functions, rather than the value
functions considered by previous works. State density has a clear physical and
mathematical interpretation, and is able to express a wide variety of
constraints such as resource limits and safety requirements. Density
constraints can also avoid the time-consuming process of designing and tuning
cost functions required by value function-based constraints to encode system
specifications. We leverage the duality between density functions and Q
functions to develop an effective algorithm to solve the density constrained RL
problem optimally and the constrains are guaranteed to be satisfied. We prove
that the proposed algorithm converges to a near-optimal solution with a bounded
error even when the policy update is imperfect. We use a set of comprehensive
experiments to demonstrate the advantages of our approach over state-of-the-art
CRL methods, with a wide range of density constrained tasks as well as standard
CRL benchmarks such as Safety-Gym.
Related papers
- OTClean: Data Cleaning for Conditional Independence Violations using
Optimal Transport [51.6416022358349]
sys is a framework that harnesses optimal transport theory for data repair under Conditional Independence (CI) constraints.
We develop an iterative algorithm inspired by Sinkhorn's matrix scaling algorithm, which efficiently addresses high-dimensional and large-scale data.
arXiv Detail & Related papers (2024-03-04T18:23:55Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Optimal Conservative Offline RL with General Function Approximation via
Augmented Lagrangian [18.2080757218886]
offline reinforcement learning (RL) refers to decision-making from a previously-collected dataset of interactions.
We present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability.
arXiv Detail & Related papers (2022-11-01T19:28:48Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Reachability Constrained Reinforcement Learning [6.5158195776494]
This paper proposes a reachability CRL (RCRL) method by using reachability analysis to characterize the largest feasible sets.
We also use the multi-time scale approximation theory to prove that the proposed algorithm converges to a local optimum.
Empirical results on different benchmarks such as safe-control-gym and Safety-Gym validate the learned feasible set, the performance in optimal criteria, and constraint satisfaction of RCRL.
arXiv Detail & Related papers (2022-05-16T09:32:45Z) - Constrained Model-Free Reinforcement Learning for Process Optimization [0.0]
Reinforcement learning (RL) is a control approach that can handle nonlinear optimal control problems.
Despite the promise exhibited, RL has yet to see marked translation to industrial practice.
We propose an 'oracle'-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-11-16T13:16:22Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.