Learning from humans: combining imitation and deep reinforcement
learning to accomplish human-level performance on a virtual foraging task
- URL: http://arxiv.org/abs/2203.06250v1
- Date: Fri, 11 Mar 2022 20:52:30 GMT
- Title: Learning from humans: combining imitation and deep reinforcement
learning to accomplish human-level performance on a virtual foraging task
- Authors: Vittorio Giammarino, Matthew F Dunne, Kylie N Moore, Michael E
Hasselmo, Chantal E Stern, Ioannis Ch. Paschalidis
- Abstract summary: We develop a method to learn bio-inspired foraging policies using human data.
We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards.
- Score: 6.263481844384228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a method to learn bio-inspired foraging policies using human data.
We conduct an experiment where humans are virtually immersed in an open field
foraging environment and are trained to collect the highest amount of rewards.
A Markov Decision Process (MDP) framework is introduced to model the human
decision dynamics. Then, Imitation Learning (IL) based on maximum likelihood
estimation is used to train Neural Networks (NN) that map human decisions to
observed states. The results show that passive imitation substantially
underperforms humans. We further refine the human-inspired policies via
Reinforcement Learning (RL), using on-policy algorithms that are more suitable
to learn from pre-trained networks. We show that the combination of IL and RL
can match human results and that good performance strongly depends on an
egocentric representation of the environment. The developed methodology can be
used to efficiently learn policies for unmanned vehicles which have to solve
missions in an open field environment.
Related papers
- Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks.
Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies.
We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF)
We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - End-to-end grasping policies for human-in-the-loop robots via deep
reinforcement learning [24.407804468007228]
State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromy robustness (EMG) inference issues.
We present a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories.
arXiv Detail & Related papers (2021-04-26T19:39:23Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z) - Human-guided Robot Behavior Learning: A GAN-assisted Preference-based
Reinforcement Learning Approach [2.9764834057085716]
We propose a new GAN-assisted human preference-based reinforcement learning approach.
It uses a generative adversarial network (GAN) to actively learn human preferences and then replace the role of human in assigning preferences.
Our method can achieve a reduction of about 99.8% human time without performance sacrifice.
arXiv Detail & Related papers (2020-10-15T01:44:06Z) - Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems [0.8223798883838329]
This research investigates how to integrate human interaction modalities to the reinforcement learning loop.
Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
arXiv Detail & Related papers (2020-08-30T17:28:18Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.