Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments
- URL: http://arxiv.org/abs/2310.05712v1
- Date: Mon, 9 Oct 2023 13:35:28 GMT
- Title: Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments
- Authors: Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan
Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu
- Abstract summary: We propose a new topic called imitator learning (ItorL)
It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations.
For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
- Score: 45.213059639254475
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Imitation learning (IL) enables agents to mimic expert behaviors. Most
previous IL techniques focus on precisely imitating one policy through mass
demonstrations. However, in many applications, what humans require is the
ability to perform various tasks directly through a few demonstrations of
corresponding tasks, where the agent would meet many unexpected changes when
deployed. In this scenario, the agent is expected to not only imitate the
demonstration but also adapt to unforeseen environmental changes.
This motivates us to propose a new topic called imitator learning (ItorL),
which aims to derive an imitator module that can on-the-fly reconstruct the
imitation policies based on very limited expert demonstrations for different
unseen tasks, without any extra adjustment. In this work, we focus on imitator
learning based on only one expert demonstration. To solve ItorL, we propose
Demo-Attention Actor-Critic (DAAC), which integrates IL into a
reinforcement-learning paradigm that can regularize policies' behaviors in
unexpected situations. Besides, for autonomous imitation policy building, we
design a demonstration-based attention architecture for imitator policy that
can effectively output imitated actions by adaptively tracing the suitable
states in demonstrations. We develop a new navigation benchmark and a robot
environment for \topic~and show that DAAC~outperforms previous imitation
methods \textit{with large margins} both on seen and unseen tasks.
Related papers
- Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - GAN-MPC: Training Model Predictive Controllers with Parameterized Cost
Functions using Demonstrations from Non-identical Experts [14.291720751625585]
We propose a generative adversarial network (GAN) to minimize the Jensen-Shannon divergence between the state-trajectory distributions of the demonstrator and the imitator.
We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite.
arXiv Detail & Related papers (2023-05-30T15:15:30Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Inferring Versatile Behavior from Demonstrations by Matching Geometric
Descriptors [72.62423312645953]
Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps.
Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting.
Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility.
arXiv Detail & Related papers (2022-10-17T16:42:59Z) - Eliciting Compatible Demonstrations for Multi-Human Imitation Learning [16.11830547863391]
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.
Natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task.
This mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations.
We show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users.
arXiv Detail & Related papers (2022-10-14T19:37:55Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - Learning Feasibility to Imitate Demonstrators with Different Dynamics [23.239058855103067]
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations.
We learn a feasibility metric that captures the likelihood of a demonstration being feasible by the imitator.
Our experiments on four simulated environments and on a real robot show that the policy learned with our approach achieves a higher expected return than prior works.
arXiv Detail & Related papers (2021-10-28T14:15:47Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z) - State-only Imitation with Transition Dynamics Mismatch [16.934888672659824]
Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior.
We present a new state-only IL algorithm in this paper.
We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs.
arXiv Detail & Related papers (2020-02-27T02:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.