Interactive Imitation Learning in State-Space
- URL: http://arxiv.org/abs/2008.00524v2
- Date: Tue, 17 Nov 2020 11:37:48 GMT
- Title: Interactive Imitation Learning in State-Space
- Authors: Snehal Jauhri, Carlos Celemin, Jens Kober
- Abstract summary: We propose a novel Interactive Learning technique that uses human feedback in state-space to train and improve agent behavior.
Our method titled Teaching Imitative Policies in State-space(TIPS) enables providing guidance to the agent in terms of changing its state'
- Score: 5.672132510411464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation Learning techniques enable programming the behavior of agents
through demonstrations rather than manual engineering. However, they are
limited by the quality of available demonstration data. Interactive Imitation
Learning techniques can improve the efficacy of learning since they involve
teachers providing feedback while the agent executes its task. In this work, we
propose a novel Interactive Learning technique that uses human feedback in
state-space to train and improve agent behavior (as opposed to alternative
methods that use feedback in action-space). Our method titled Teaching
Imitative Policies in State-space~(TIPS) enables providing guidance to the
agent in terms of `changing its state' which is often more intuitive for a
human demonstrator. Through continuous improvement via corrective feedback,
agents trained by non-expert demonstrators using TIPS outperformed the
demonstrator and conventional Imitation Learning agents.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Continual Learning for Instruction Following from Realtime Feedback [23.078048024461264]
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions.
During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions.
We design a contextual bandit learning approach, converting user feedback to immediate reward.
We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time.
arXiv Detail & Related papers (2022-12-19T18:39:43Z) - Imitation Learning from Observations under Transition Model Disparity [22.456737935789103]
Learning to perform tasks by leveraging a dataset of expert observations (ILO) is an important paradigm for learning skills without access to the expert reward function or the expert actions.
Recent methods for scalable ILO utilize adversarial learning to match the state-transition distributions of the expert and the learner.
We propose an algorithm that trains an intermediary policy in the learner environment and uses it as a surrogate expert for the learner.
arXiv Detail & Related papers (2022-04-25T05:36:54Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - SAFARI: Safe and Active Robot Imitation Learning with Imagination [16.967930721746676]
SAFARI is a novel active learning and control algorithm.
It allows an agent to request further human demonstrations when these out-of-distribution situations are met.
We show how this method enables the agent to autonomously predict failure rapidly and safely.
arXiv Detail & Related papers (2020-11-18T23:43:59Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.