Human-AI Shared Control via Frequency-based Policy Dissection
- URL: http://arxiv.org/abs/2206.00152v1
- Date: Tue, 31 May 2022 23:57:55 GMT
- Title: Human-AI Shared Control via Frequency-based Policy Dissection
- Authors: Quanyi Li, Zhenghao Peng, Haibin Wu, Lan Feng, Bolei Zhou
- Abstract summary: Human-AI shared control allows human to interact and collaborate with AI to accomplish control tasks in complex environments.
Previous Reinforcement Learning (RL) methods attempt the goal-conditioned design to achieve human-controllable policies.
We develop a simple yet effective frequency-based approach called textitPolicy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.
- Score: 34.0399894373716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-AI shared control allows human to interact and collaborate with AI to
accomplish control tasks in complex environments. Previous Reinforcement
Learning (RL) methods attempt the goal-conditioned design to achieve
human-controllable policies at the cost of redesigning the reward function and
training paradigm. Inspired by the neuroscience approach to investigate the
motor cortex in primates, we develop a simple yet effective frequency-based
approach called \textit{Policy Dissection} to align the intermediate
representation of the learned neural controller with the kinematic attributes
of the agent behavior. Without modifying the neural controller or retraining
the model, the proposed approach can convert a given RL-trained policy into a
human-interactive policy. We evaluate the proposed approach on the RL tasks of
autonomous driving and locomotion. The experiments show that human-AI shared
control achieved by Policy Dissection in driving task can substantially improve
the performance and safety in unseen traffic scenes. With human in the loop,
the locomotion robots also exhibit versatile controllable motion skills even
though they are only trained to move forward. Our results suggest the promising
direction of implementing human-AI shared autonomy through interpreting the
learned representation of the autonomous agents. Demo video and code will be
made available at https://metadriverse.github.io/policydissect.
Related papers
- Hand-Object Interaction Pretraining from Videos [77.92637809322231]
We learn general robot manipulation priors from 3D hand-object interaction trajectories.
We do so by sharing both the human hand and the manipulated object in 3D space and human motions to robot actions.
We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches.
arXiv Detail & Related papers (2024-09-12T17:59:07Z) - Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots.
We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing.
This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - Decentralized Motor Skill Learning for Complex Robotic Systems [5.669790037378093]
We propose a Decentralized motor skill (DEMOS) learning algorithm to automatically discover motor groups that can be decoupled from each other.
Our method improves the robustness and generalization of the policy without sacrificing performance.
Experiments on quadruped and humanoid robots demonstrate that the learned policy is robust against local motor malfunctions and can be transferred to new tasks.
arXiv Detail & Related papers (2023-06-30T05:55:34Z) - Learning a Universal Human Prior for Dexterous Manipulation from Human
Preference [35.54663426598218]
We propose a framework that learns a universal human prior using direct human preference feedback over videos.
A task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories.
Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks.
arXiv Detail & Related papers (2023-04-10T14:17:33Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - AirCapRL: Autonomous Aerial Human Motion Capture using Deep
Reinforcement Learning [38.429105809093116]
We introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap)
We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape a single moving person using multiple aerial vehicles.
arXiv Detail & Related papers (2020-07-13T12:30:31Z) - Deep Reinforcement Learning for Human-Like Driving Policies in Collision
Avoidance Tasks of Self-Driving Cars [1.160208922584163]
We introduce a model-free, deep reinforcement learning approach to generate automated human-like driving policies.
We study a static obstacle avoidance task on a two-lane highway road in simulation.
We demonstrate that our approach leads to human-like driving policies.
arXiv Detail & Related papers (2020-06-07T18:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.