Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2107.07316v1
- Date: Thu, 15 Jul 2021 13:36:55 GMT
- Title: Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning
- Authors: Danial Kamran, Tizian Engelgeh, Marvin Busch, Johannes Fischer and
Christoph Stiller
- Abstract summary: We propose a distributional reinforcement learning framework to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility.
We show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.
- Score: 3.923354711049903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advances in reinforcement learning (RL), its application in
safety critical domains like autonomous vehicles is still challenging. Although
punishing RL agents for risky situations can help to learn safe policies, it
may also lead to highly conservative behavior. In this paper, we propose a
distributional RL framework in order to learn adaptive policies that can tune
their level of conservativity at run-time based on the desired comfort and
utility. Using a proactive safety verification approach, the proposed framework
can guarantee that actions generated from RL are fail-safe according to the
worst-case assumptions. Concurrently, the policy is encouraged to minimize
safety interference and generate more comfortable behavior. We trained and
evaluated the proposed approach and baseline policies using a high level
simulator with a variety of randomized scenarios including several corner cases
which rarely happen in reality but are very crucial. In light of our
experiments, the behavior of policies learned using distributional RL can be
adaptive at run-time and robust to the environment uncertainty. Quantitatively,
the learned distributional RL agent drives in average 8 seconds faster than the
normal DQN policy and requires 83\% less safety interference compared to the
rule-based policy with slightly increasing the average crossing time. We also
study sensitivity of the learned policy in environments with higher perception
noise and show that our algorithm learns policies that can still drive reliable
when the perception noise is two times higher than the training configuration
for automated merging and crossing at occluded intersections.
Related papers
- RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes [57.319845580050924]
We propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum.
We show that our algorithm is capable of learning high-speed policies for a real-world off-road driving task.
arXiv Detail & Related papers (2024-05-07T23:32:36Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Cautious Adaptation For Reinforcement Learning in Safety-Critical
Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous.
We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments.
We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z) - Reinforcement Learning based Control of Imitative Policies for
Near-Accident Driving [41.54021613421446]
In near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences.
We propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
arXiv Detail & Related papers (2020-07-01T01:41:45Z) - Safe Reinforcement Learning for Autonomous Vehicles through Parallel
Constrained Policy Optimization [20.913475536020247]
This paper presents a safeReinforcement learning algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks.
PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function.
To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update.
arXiv Detail & Related papers (2020-03-03T02:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.