Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations
- URL: http://arxiv.org/abs/2106.09256v1
- Date: Thu, 17 Jun 2021 05:44:04 GMT
- Title: Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations
- Authors: Xin-Qiang Cai, Yao-Xiang Ding, Zi-Xuan Chen, Yuan Jiang, Masashi
Sugiyama, Zhi-Hua Zhou
- Abstract summary: In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces.
In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL)
We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
- Score: 126.78199124026398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world imitation learning tasks, the demonstrator and the learner
have to act in different but full observation spaces. This situation generates
significant obstacles for existing imitation learning approaches to work, even
when they are combined with traditional space adaptation techniques. The main
challenge lies in bridging expert's occupancy measures to learner's dynamically
changing occupancy measures under the different observation spaces. In this
work, we model the above learning problem as Heterogeneous Observations
Imitation Learning (HOIL). We propose the Importance Weighting with REjection
(IWRE) algorithm based on the techniques of importance-weighting, learning with
rejection, and active querying to solve the key challenge of occupancy measure
matching. Experimental results show that IWRE can successfully solve HOIL
tasks, including the challenging task of transforming the vision-based
demonstrations to random access memory (RAM)-based policies under the Atari
domain.
Related papers
- Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.
We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition.
Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z) - Robust Visual Imitation Learning with Inverse Dynamics Representations [32.806294517277976]
We develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment.
With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data.
Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
arXiv Detail & Related papers (2023-10-22T11:47:35Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Imitation by Predicting Observations [17.86983397979034]
We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks.
Our method, which we call FORM, is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations.
We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
arXiv Detail & Related papers (2021-07-08T14:09:30Z) - Cross-domain Imitation from Observations [50.669343548588294]
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior.
In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP.
We present a novel framework to learn correspondences across such domains.
arXiv Detail & Related papers (2021-05-20T21:08:25Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.