Efficient Learning of Safe Driving Policy via Human-AI Copilot
Optimization
- URL: http://arxiv.org/abs/2202.10341v1
- Date: Thu, 17 Feb 2022 06:29:46 GMT
- Title: Efficient Learning of Safe Driving Policy via Human-AI Copilot
Optimization
- Authors: Quanyi Li, Zhenghao Peng, Bolei Zhou
- Abstract summary: We develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO)
The proposed HACO effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent.
experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark.
- Score: 38.21629972247463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human intervention is an effective way to inject human knowledge into the
training loop of reinforcement learning, which can bring fast learning and
ensured training safety. Given the very limited budget of human intervention,
it remains challenging to design when and how human expert interacts with the
learning agent in the training. In this work, we develop a novel
human-in-the-loop learning method called Human-AI Copilot Optimization
(HACO).To allow the agent's sufficient exploration in the risky environments
while ensuring the training safety, the human expert can take over the control
and demonstrate how to avoid probably dangerous situations or trivial
behaviors. The proposed HACO then effectively utilizes the data both from the
trial-and-error exploration and human's partial demonstration to train a
high-performing agent. HACO extracts proxy state-action values from partial
human demonstration and optimizes the agent to improve the proxy values
meanwhile reduce the human interventions. The experiments show that HACO
achieves a substantially high sample efficiency in the safe driving benchmark.
HACO can train agents to drive in unseen traffic scenarios with a handful of
human intervention budget and achieve high safety and generalizability,
outperforming both reinforcement learning and imitation learning baselines with
a large margin. Code and demo videos are available at:
https://decisionforce.github.io/HACO/.
Related papers
- Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving [1.5361702135159845]
Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency.
Inspired by the human learning process, we propose Physics-enhanced Reinforcement Learning with Human Feedback (PE-RLHF)
PE-RLHF guarantees the learned policy will perform at least as well as the given physics-based policy, even when human feedback quality deteriorates.
arXiv Detail & Related papers (2024-09-01T22:20:32Z) - Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models [94.39278422567955]
Fine-tuning large language models (LLMs) on human preferences has proven successful in enhancing their capabilities.
However, ensuring the safety of LLMs during the fine-tuning remains a critical concern.
We propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO) to address this issue.
arXiv Detail & Related papers (2024-08-27T17:31:21Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - Primitive Skill-based Robot Learning from Human Evaluative Feedback [28.046559859978597]
Reinforcement learning algorithms face challenges when dealing with long-horizon robot manipulation tasks in real-world environments.
We propose a novel framework, SEED, which leverages two approaches: reinforcement learning from human feedback (RLHF) and primitive skill-based reinforcement learning.
Our results show that SEED significantly outperforms state-of-the-art RL algorithms in sample efficiency and safety.
arXiv Detail & Related papers (2023-07-28T20:48:30Z) - Imitation Is Not Enough: Robustifying Imitation with Reinforcement
Learning for Challenging Driving Scenarios [147.16925581385576]
We show how imitation learning combined with reinforcement learning can substantially improve the safety and reliability of driving policies.
We train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood.
arXiv Detail & Related papers (2022-12-21T23:59:33Z) - Minimizing Human Assistance: Augmenting a Single Demonstration for Deep
Reinforcement Learning [0.0]
We use a single human example collected through a simple-to-use virtual reality simulation to assist with RL training.
Our method augments a single demonstration to generate numerous human-like demonstrations.
Despite learning from a human example, the agent is not constrained to human-level performance.
arXiv Detail & Related papers (2022-09-22T19:04:43Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Provably Safe Deep Reinforcement Learning for Robotic Manipulation in
Human Environments [8.751383865142772]
We propose a shielding mechanism that ensures ISO-verified human safety while training and deploying RL algorithms on manipulators.
We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range.
arXiv Detail & Related papers (2022-05-12T18:51:07Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adversarial Training is Not Ready for Robot Learning [55.493354071227174]
Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations.
We show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects.
Our results suggest that adversarial training is not yet ready for robot learning.
arXiv Detail & Related papers (2021-03-15T07:51:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.