Learning from Active Human Involvement through Proxy Value Propagation
- URL: http://arxiv.org/abs/2502.03369v1
- Date: Wed, 05 Feb 2025 17:07:37 GMT
- Title: Learning from Active Human Involvement through Proxy Value Propagation
- Authors: Zhenghao Peng, Wenjie Mo, Chenda Duan, Quanyi Li, Bolei Zhou,
- Abstract summary: Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training.
We propose a new reward-free active human involvement method called Proxy Value propagation for policy optimization.
Our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V.
- Score: 44.144964115275
- License:
- Abstract: Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training. The interaction and corrective feedback from human brings safety and AI alignment to the learning process. In this work, we propose a new reward-free active human involvement method called Proxy Value Propagation for policy optimization. Our key insight is that a proxy value function can be designed to express human intents, wherein state-action pairs in the human demonstration are labeled with high values, while those agents' actions that are intervened receive low values. Through the TD-learning framework, labeled values of demonstrated state-action pairs are further propagated to other unlabeled data generated from agents' exploration. The proxy value function thus induces a policy that faithfully emulates human behaviors. Human-in-the-loop experiments show the generality and efficiency of our method. With minimal modification to existing reinforcement learning algorithms, our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V. Demo video and code are available at: https://metadriverse.github.io/pvp
Related papers
- HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective.
We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z) - Human-AI Shared Control via Frequency-based Policy Dissection [34.0399894373716]
Human-AI shared control allows human to interact and collaborate with AI to accomplish control tasks in complex environments.
Previous Reinforcement Learning (RL) methods attempt the goal-conditioned design to achieve human-controllable policies.
We develop a simple yet effective frequency-based approach called textitPolicy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.
arXiv Detail & Related papers (2022-05-31T23:57:55Z) - Efficient Learning of Safe Driving Policy via Human-AI Copilot
Optimization [38.21629972247463]
We develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO)
The proposed HACO effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent.
experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark.
arXiv Detail & Related papers (2022-02-17T06:29:46Z) - Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration [51.268988527778276]
We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations.
Our method co-optimizes a human policy and a robot policy in an interactive learning process.
arXiv Detail & Related papers (2021-08-13T03:14:43Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.