Related papers: A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

URL: http://arxiv.org/abs/2303.04118v1
Date: Tue, 7 Mar 2023 18:29:15 GMT
Title: A Multiplicative Value Function for Safe and Efficient Reinforcement Learning
Authors: Nick B\"uhrer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc Van Gool
Abstract summary: We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
Score: 131.96501469927733
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: An emerging field of sequential decision problems is safe Reinforcement Learning (RL), where the objective is to maximize the reward while obeying safety constraints. Being able to handle constraints is essential for deploying RL agents in real-world environments, where constraint violations can harm the agent and the environment. To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. By splitting responsibilities, we facilitate the learning task leading to increased sample efficiency. We integrate our approach into two popular RL algorithms, Proximal Policy Optimization and Soft Actor-Critic, and evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations. Finally, we make the zero-shot sim-to-real transfer where a differential drive robot has to navigate through a cluttered room. Our code can be found at https://github.com/nikeke19/Safe-Mult-RL.

Related papers

REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning [16.176812250762666]
We propose a trust region-based safe RL method with CVaR constraints, called TRC. We first derive the upper bound on CVaR and then approximate the upper bound in a differentiable form in a trust region. Compared to other safe RL methods, the performance is improved by 1.93 times while the constraints are satisfied in all experiments.
arXiv Detail & Related papers (2023-12-01T04:40:47Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Safe Reinforcement Learning using Data-Driven Predictive Control [0.5459797813771499]
We propose a data-driven safety layer that acts as a filter for unsafe actions. The safety layer penalizes the RL agent if the proposed action is unsafe and replaces it with the closest safe one. In a simulation, we show that our method outperforms state-of-the-art safe RL methods on the robotics navigation problem.
arXiv Detail & Related papers (2022-11-20T17:10:40Z)
Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning [7.138691584246846]
We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions. The safety index is designed to increase rapidly for potentially dangerous actions. We claim that we can learn the energy function in a model-free manner similar to learning a value function.
arXiv Detail & Related papers (2021-11-25T07:24:30Z)
Constraint-Guided Reinforcement Learning: Augmenting the Agent-Environment-Interaction [10.203602318836445]
Reinforcement Learning (RL) agents have great successes in solving tasks with large observation and action spaces from limited feedback. This paper discusses the engineering of reliable agents via the integration of deep RL with constraint-based augmentation models. Our results show that constraint-guidance does both provide reliability improvements and safer behavior, as well as accelerated training.
arXiv Detail & Related papers (2021-04-24T10:04:14Z)
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [81.49106778460238]
Recovery RL uses offline data to learn about constraint violating zones before policy learning. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task. Results suggest that Recovery RL trades off constraint violations and task successes 2 - 20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments.
arXiv Detail & Related papers (2020-10-29T20:10:02Z)
Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous. We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.