Minimizing Human Assistance: Augmenting a Single Demonstration for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.11275v2
- Date: Sun, 19 Mar 2023 03:14:42 GMT
- Title: Minimizing Human Assistance: Augmenting a Single Demonstration for Deep
Reinforcement Learning
- Authors: Abraham George, Alison Bartsch, and Amir Barati Farimani
- Abstract summary: We use a single human example collected through a simple-to-use virtual reality simulation to assist with RL training.
Our method augments a single demonstration to generate numerous human-like demonstrations.
Despite learning from a human example, the agent is not constrained to human-level performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The use of human demonstrations in reinforcement learning has proven to
significantly improve agent performance. However, any requirement for a human
to manually 'teach' the model is somewhat antithetical to the goals of
reinforcement learning. This paper attempts to minimize human involvement in
the learning process while retaining the performance advantages by using a
single human example collected through a simple-to-use virtual reality
simulation to assist with RL training. Our method augments a single
demonstration to generate numerous human-like demonstrations that, when
combined with Deep Deterministic Policy Gradients and Hindsight Experience
Replay (DDPG + HER) significantly improve training time on simple tasks and
allows the agent to solve a complex task (block stacking) that DDPG + HER alone
cannot solve. The model achieves this significant training advantage using a
single human example, requiring less than a minute of human input. Moreover,
despite learning from a human example, the agent is not constrained to
human-level performance, often learning a policy that is significantly
different from the human demonstration.
Related papers
- MILES: Making Imitation Learning Easy with Self-Supervision [12.314942459360605]
MILES is a fully autonomous, self-supervised data collection paradigm.
We show that MILES enables efficient policy learning from just a single demonstration and a single environment reset.
arXiv Detail & Related papers (2024-10-25T17:06:50Z) - GUIDE: Real-Time Human-Shaped Agents [4.676987516944155]
We introduce GUIDE, a framework for real-time human-guided reinforcement learning.
With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.
arXiv Detail & Related papers (2024-10-19T18:59:39Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Human-guided Robot Behavior Learning: A GAN-assisted Preference-based
Reinforcement Learning Approach [2.9764834057085716]
We propose a new GAN-assisted human preference-based reinforcement learning approach.
It uses a generative adversarial network (GAN) to actively learn human preferences and then replace the role of human in assigning preferences.
Our method can achieve a reduction of about 99.8% human time without performance sacrifice.
arXiv Detail & Related papers (2020-10-15T01:44:06Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.