AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning
- URL: http://arxiv.org/abs/2301.10339v1
- Date: Tue, 24 Jan 2023 22:51:29 GMT
- Title: AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning
- Authors: Tairan He, Weiye Zhao, Changliu Liu
- Abstract summary: We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance.
We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
- Score: 3.4806267677524896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is a critical hurdle that limits the application of deep reinforcement
learning (RL) to real-world control tasks. To this end, constrained
reinforcement learning leverages cost functions to improve safety in
constrained Markov decision processes. However, such constrained RL methods
fail to achieve zero violation even when the cost limit is zero. This paper
analyzes the reason for such failure, which suggests that a proper cost
function plays an important role in constrained RL. Inspired by the analysis,
we propose AutoCost, a simple yet effective framework that automatically
searches for cost functions that help constrained RL to achieve zero-violation
performance. We validate the proposed method and the searched cost function on
the safe RL benchmark Safety Gym. We compare the performance of augmented
agents that use our cost function to provide additive intrinsic costs with
baseline agents that use the same policy learners but with only extrinsic
costs. Results show that the converged policies with intrinsic costs in all
environments achieve zero constraint violation and comparable performance with
baselines.
Related papers
- Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning [57.154674117714265]
We show that the number of samples needed to learn a near-optimal policy with FQI-log scales with the accumulated cost of the optimal policy.
We empirically verify that FQI-log uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
arXiv Detail & Related papers (2024-03-08T15:30:58Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Efficient Exploration Using Extra Safety Budget in Constrained Policy
Optimization [15.483557012655927]
We propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satisfaction.
Our method gains remarkable performance improvement under the same cost limit compared with baselines.
arXiv Detail & Related papers (2023-02-28T06:16:34Z) - Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties [8.86470998648085]
Key challenge is handling expected cost accumulated using the policy.
Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions.
We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
arXiv Detail & Related papers (2023-01-27T08:33:08Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Robust Reinforcement Learning in Continuous Control Tasks with
Uncertainty Set Regularization [17.322284328945194]
Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations.
We propose a new regularizer named $textbfU$ncertainty $textbfS$et $textbfR$egularizer (USR)
arXiv Detail & Related papers (2022-07-05T12:56:08Z) - Timing is Everything: Learning to Act Selectively with Costly Actions
and Budgetary Constraints [9.132215354916784]
We introduce a reinforcement learning framework named textbfLearnable textbfImpulse textbfControl textbfReinforcement textbfAlgorithm (LICRA)
At the core of LICRA is a nested structure that combines RL and a form of policy known as textitimpulse control which learns to maximise objectives when actions incur costs.
We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely.
arXiv Detail & Related papers (2022-05-31T16:50:46Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.