An Imitation from Observation Approach to Transfer Learning with
Dynamics Mismatch
- URL: http://arxiv.org/abs/2008.01594v3
- Date: Mon, 16 Nov 2020 22:58:45 GMT
- Title: An Imitation from Observation Approach to Transfer Learning with
Dynamics Mismatch
- Authors: Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell,
Josiah Hanna, Peter Stone
- Abstract summary: We show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation.
We derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques.
We find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods.
- Score: 44.898655782896306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine the problem of transferring a policy learned in a source
environment to a target environment with different dynamics, particularly in
the case where it is critical to reduce the amount of interaction with the
target environment during learning. This problem is particularly important in
sim-to-real transfer because simulators inevitably model real-world dynamics
imperfectly. In this paper, we show that one existing solution to this transfer
problem - grounded action transformation - is closely related to the problem of
imitation from observation (IfO): learning behaviors that mimic the
observations of behavior demonstrations. After establishing this relationship,
we hypothesize that recent state-of-the-art approaches from the IfO literature
can be effectively repurposed for grounded transfer learning.To validate our
hypothesis we derive a new algorithm - generative adversarial reinforced action
transformation (GARAT) - based on adversarial imitation from observation
techniques. We run experiments in several domains with mismatched dynamics, and
find that agents trained with GARAT achieve higher returns in the target
environment compared to existing black-box transfer methods
Related papers
- Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Revisiting the Robustness of the Minimum Error Entropy Criterion: A
Transfer Learning Case Study [16.07380451502911]
This paper revisits the robustness of the minimum error entropy criterion to deal with non-Gaussian noises.
We investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common.
arXiv Detail & Related papers (2023-07-17T15:38:11Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Transfer RL via the Undo Maps Formalism [29.798971172941627]
Transferring knowledge across domains is one of the most fundamental problems in machine learning.
We propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains.
We show this objective leads to a policy update scheme reminiscent of imitation learning, and derive an efficient algorithm to implement it.
arXiv Detail & Related papers (2022-11-26T03:44:28Z) - Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects
Estimation [103.55894890759376]
This paper introduces several building blocks that use representation learning to handle the heterogeneous feature spaces.
We show how these building blocks can be used to recover transfer learning equivalents of the standard CATE learners.
arXiv Detail & Related papers (2022-10-08T16:41:02Z) - A New Representation of Successor Features for Transfer across
Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics.
We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes.
Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z) - Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations [126.78199124026398]
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces.
In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL)
We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
arXiv Detail & Related papers (2021-06-17T05:44:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.