GAN-Based Interactive Reinforcement Learning from Demonstration and
Human Evaluative Feedback
- URL: http://arxiv.org/abs/2104.06600v1
- Date: Wed, 14 Apr 2021 02:58:51 GMT
- Title: GAN-Based Interactive Reinforcement Learning from Demonstration and
Human Evaluative Feedback
- Authors: Jie Huang, Rongshun Juan, Randy Gomez, Keisuke Nakamura, Qixin Sha, Bo
He, Guangliang Li
- Abstract summary: We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback.
We tested our proposed method in six physics-based control tasks.
- Score: 6.367592686247906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep reinforcement learning (DRL) has achieved great successes in many
simulated tasks. The sample inefficiency problem makes applying traditional DRL
methods to real-world robots a great challenge. Generative Adversarial
Imitation Learning (GAIL) -- a general model-free imitation learning method,
allows robots to directly learn policies from expert trajectories in large
environments. However, GAIL shares the limitation of other imitation learning
methods that they can seldom surpass the performance of demonstrations. In this
paper, to address the limit of GAIL, we propose GAN-Based Interactive
Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback
by combining the advantages of GAIL and interactive reinforcement learning. We
tested our proposed method in six physics-based control tasks, ranging from
simple low-dimensional control tasks -- Cart Pole and Mountain Car, to
difficult high-dimensional tasks -- Inverted Double Pendulum, Lunar Lander,
Hopper and HalfCheetah. Our results suggest that with both optimal and
suboptimal demonstrations, a GAIRL agent can always learn a more stable policy
with optimal or close to optimal performance, while the performance of the GAIL
agent is upper bounded by the performance of demonstrations or even worse than
it. In addition, our results indicate the reason that GAIRL is superior over
GAIL is the complementary effect of demonstrations and human evaluative
feedback.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Learning from Ambiguous Demonstrations with Self-Explanation Guided
Reinforcement Learning [20.263419567168388]
Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent.
Inspired by how humans handle such situations, we propose to use self-explanation to recognize valuable high-level relational features.
Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works.
arXiv Detail & Related papers (2021-10-11T13:59:48Z) - Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors.
In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL.
We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.