Safe Driving via Expert Guided Policy Optimization
- URL: http://arxiv.org/abs/2110.06831v1
- Date: Wed, 13 Oct 2021 16:19:03 GMT
- Title: Safe Driving via Expert Guided Policy Optimization
- Authors: Zhenghao Peng, Quanyi Li, Chunxiao Liu, Bolei Zhou
- Abstract summary: Expert-in-the-loop Reinforcement Learning is used to safeguard the exploration of the learning agent.
We develop a novel Expert Guided Policy Optimization (EGPO) method which integrates the guardian in the loop of reinforcement learning.
Our method achieves superior training and test-time safety, outperforms baselines with a substantial margin in sample efficiency, and preserves the generalizabiliy to unseen environments in test-time.
- Score: 38.68691065718655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When learning common skills like driving, beginners usually have domain
experts standing by to ensure the safety of the learning process. We formulate
such learning scheme under the Expert-in-the-loop Reinforcement Learning where
a guardian is introduced to safeguard the exploration of the learning agent.
While allowing the sufficient exploration in the uncertain environment, the
guardian intervenes under dangerous situations and demonstrates the correct
actions to avoid potential accidents. Thus ERL enables both exploration and
expert's partial demonstration as two training sources. Following such a
setting, we develop a novel Expert Guided Policy Optimization (EGPO) method
which integrates the guardian in the loop of reinforcement learning. The
guardian is composed of an expert policy to generate demonstration and a switch
function to decide when to intervene. Particularly, a constrained optimization
technique is used to tackle the trivial solution that the agent deliberately
behaves dangerously to deceive the expert into taking over. Offline RL
technique is further used to learn from the partial demonstration generated by
the expert. Safe driving experiments show that our method achieves superior
training and test-time safety, outperforms baselines with a substantial margin
in sample efficiency, and preserves the generalizabiliy to unseen environments
in test-time. Demo video and source code are available at:
https://decisionforce.github.io/EGPO/
Related papers
- Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding [5.5929450570003185]
Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable.
We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training.
arXiv Detail & Related papers (2024-05-28T13:47:21Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Learning to Drive Using Sparse Imitation Reinforcement Learning [0.5076419064097732]
We propose a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy.
We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator.
arXiv Detail & Related papers (2022-05-24T15:03:11Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent
Policy Optimization [1.0965065178451106]
We study the problem of obtaining a control policy that can mimic and then outperform expert demonstrations in Markov decision processes.
One main relevant approach is the inverse reinforcement learning (IRL), which mainly focuses on inferring a reward function from expert demonstrations.
We propose a novel method that enables the learning agent to outperform the demonstrator via a new concurrent reward and action policy learning approach.
arXiv Detail & Related papers (2020-09-21T02:16:21Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.