Safety-aware Policy Optimisation for Autonomous Racing
- URL: http://arxiv.org/abs/2110.07699v1
- Date: Thu, 14 Oct 2021 20:15:45 GMT
- Title: Safety-aware Policy Optimisation for Autonomous Racing
- Authors: Bingqing Chen, Jonathan Francis, James Herman, Jean Oh, Eric Nyberg,
Sylvia L. Herbert
- Abstract summary: We introduce Hamilton-Jacobi (HJ) reachability theory into the constrained Markov decision process (CMDP) framework.
We demonstrate that the HJ safety value can be learned directly on vision context.
We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment.
- Score: 17.10371721305536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To be viable for safety-critical applications, such as autonomous driving and
assistive robotics, autonomous agents should adhere to safety constraints
throughout the interactions with their environments. Instead of learning about
safety by collecting samples, including unsafe ones, methods such as
Hamilton-Jacobi (HJ) reachability compute safe sets with theoretical guarantees
using models of the system dynamics. However, HJ reachability is not scalable
to high-dimensional systems, and the guarantees hinge on the quality of the
model. In this work, we inject HJ reachability theory into the constrained
Markov decision process (CMDP) framework, as a control-theoretical approach for
safety analysis via model-free updates on state-action pairs. Furthermore, we
demonstrate that the HJ safety value can be learned directly on vision context,
the highest-dimensional problem studied via the method to-date. We evaluate our
method on several benchmark tasks, including Safety Gym and Learn-to-Race
(L2R), a recently-released high-fidelity autonomous racing environment. Our
approach has significantly fewer constraint violations in comparison to other
constrained RL baselines, and achieve the new state-of-the-art results on the
L2R benchmark task.
Related papers
- A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot [0.0]
Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics.
This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF)
arXiv Detail & Related papers (2024-09-23T10:04:28Z) - Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving [3.5293763645151404]
We propose a safe MARL method grounded in a Stackelberg model with bi-level optimization.
We develop two practical algorithms, namely Constrained Stackelberg Q-learning (CSQ) and Constrained Stackelberg Multi-Agent Deep Deterministic Policy Gradient (CS-MADDPG)
Our algorithms, CSQ and CS-MADDPG, outperform several strong MARL baselines, such as Bi-AC, MACPO, and MAPPO-L, regarding reward and safety performance.
arXiv Detail & Related papers (2024-05-28T14:15:18Z) - Searching for Optimal Runtime Assurance via Reachability and
Reinforcement Learning [2.422636931175853]
runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup controller.
Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations.
In this paper, we formulate the optimal RTA design problem and present a new approach for solving it.
arXiv Detail & Related papers (2023-10-06T14:45:57Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.