Related papers: Safe Driving via Expert Guided Policy Optimization

Safe Driving via Expert Guided Policy Optimization

URL: http://arxiv.org/abs/2110.06831v1
Date: Wed, 13 Oct 2021 16:19:03 GMT
Title: Safe Driving via Expert Guided Policy Optimization
Authors: Zhenghao Peng, Quanyi Li, Chunxiao Liu, Bolei Zhou
Abstract summary: Expert-in-the-loop Reinforcement Learning is used to safeguard the exploration of the learning agent. We develop a novel Expert Guided Policy Optimization (EGPO) method which integrates the guardian in the loop of reinforcement learning. Our method achieves superior training and test-time safety, outperforms baselines with a substantial margin in sample efficiency, and preserves the generalizabiliy to unseen environments in test-time.
Score: 38.68691065718655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When learning common skills like driving, beginners usually have domain experts standing by to ensure the safety of the learning process. We formulate such learning scheme under the Expert-in-the-loop Reinforcement Learning where a guardian is introduced to safeguard the exploration of the learning agent. While allowing the sufficient exploration in the uncertain environment, the guardian intervenes under dangerous situations and demonstrates the correct actions to avoid potential accidents. Thus ERL enables both exploration and expert's partial demonstration as two training sources. Following such a setting, we develop a novel Expert Guided Policy Optimization (EGPO) method which integrates the guardian in the loop of reinforcement learning. The guardian is composed of an expert policy to generate demonstration and a switch function to decide when to intervene. Particularly, a constrained optimization technique is used to tackle the trivial solution that the agent deliberately behaves dangerously to deceive the expert into taking over. Offline RL technique is further used to learn from the partial demonstration generated by the expert. Safe driving experiments show that our method achieves superior training and test-time safety, outperforms baselines with a substantial margin in sample efficiency, and preserves the generalizabiliy to unseen environments in test-time. Demo video and source code are available at: https://decisionforce.github.io/EGPO/

Related papers

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding [5.5929450570003185]
Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable. We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training.
arXiv Detail & Related papers (2024-05-28T13:47:21Z)
Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs. We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z)
Constrained Reinforcement Learning for Robotics via Scenario-Based Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior. This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop. Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z)
Learning to Drive Using Sparse Imitation Reinforcement Learning [0.5076419064097732]
We propose a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy. We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator.
arXiv Detail & Related papers (2022-05-24T15:03:11Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL) We learn a conservative safety estimate of environment states through a critic. We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization [1.0965065178451106]
We study the problem of obtaining a control policy that can mimic and then outperform expert demonstrations in Markov decision processes. One main relevant approach is the inverse reinforcement learning (IRL), which mainly focuses on inferring a reward function from expert demonstrations. We propose a novel method that enables the learning agent to outperform the demonstrator via a new concurrent reward and action policy learning approach.
arXiv Detail & Related papers (2020-09-21T02:16:21Z)
Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.