Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems
- URL: http://arxiv.org/abs/2008.13221v1
- Date: Sun, 30 Aug 2020 17:28:18 GMT
- Title: Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems
- Authors: Vinicius G. Goecks
- Abstract summary: This research investigates how to integrate human interaction modalities to the reinforcement learning loop.
Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
- Score: 0.8223798883838329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent successes combine reinforcement learning algorithms and deep neural
networks, despite reinforcement learning not being widely applied to robotics
and real world scenarios. This can be attributed to the fact that current
state-of-the-art, end-to-end reinforcement learning approaches still require
thousands or millions of data samples to converge to a satisfactory policy and
are subject to catastrophic failures during training. Conversely, in real world
scenarios and after just a few data samples, humans are able to either provide
demonstrations of the task, intervene to prevent catastrophic actions, or
simply evaluate if the policy is performing correctly. This research
investigates how to integrate these human interaction modalities to the
reinforcement learning loop, increasing sample efficiency and enabling
real-time reinforcement learning in robotics and real world scenarios. This
novel theoretical foundation is called Cycle-of-Learning, a reference to how
different human interaction modalities, namely, task demonstration,
intervention, and evaluation, are cycled and combined to reinforcement learning
algorithms. Results presented in this work show that the reward signal that is
learned based upon human interaction accelerates the rate of learning of
reinforcement learning algorithms and that learning from a combination of human
demonstrations and interventions is faster and more sample efficient when
compared to traditional supervised learning algorithms. Finally,
Cycle-of-Learning develops an effective transition between policies learned
using human demonstrations and interventions to reinforcement learning. The
theoretical foundation developed by this research opens new research paths to
human-agent teaming scenarios where autonomous agents are able to learn from
human teammates and adapt to mission performance metrics in real-time and in
real world scenarios.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation [1.7901837062462316]
This paper aims to define and incorporate the natural symmetry present in physical robotic environments.
The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle.
A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
arXiv Detail & Related papers (2023-04-12T11:38:01Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Active Hierarchical Imitation and Reinforcement Learning [0.0]
In this project, we explored different imitation learning algorithms and designed active learning algorithms upon the hierarchical imitation and reinforcement learning framework we have developed.
Our experimental results showed that using DAgger and reward-based active learning method can achieve better performance while saving more human efforts physically and mentally during the training process.
arXiv Detail & Related papers (2020-12-14T08:27:27Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Human-guided Robot Behavior Learning: A GAN-assisted Preference-based
Reinforcement Learning Approach [2.9764834057085716]
We propose a new GAN-assisted human preference-based reinforcement learning approach.
It uses a generative adversarial network (GAN) to actively learn human preferences and then replace the role of human in assigning preferences.
Our method can achieve a reduction of about 99.8% human time without performance sacrifice.
arXiv Detail & Related papers (2020-10-15T01:44:06Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.