Transfer RL via the Undo Maps Formalism
- URL: http://arxiv.org/abs/2211.14469v1
- Date: Sat, 26 Nov 2022 03:44:28 GMT
- Title: Transfer RL via the Undo Maps Formalism
- Authors: Abhi Gupta, Ted Moskovitz, David Alvarez-Melis, Aldo Pacchiano
- Abstract summary: Transferring knowledge across domains is one of the most fundamental problems in machine learning.
We propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains.
We show this objective leads to a policy update scheme reminiscent of imitation learning, and derive an efficient algorithm to implement it.
- Score: 29.798971172941627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferring knowledge across domains is one of the most fundamental problems
in machine learning, but doing so effectively in the context of reinforcement
learning remains largely an open problem. Current methods make strong
assumptions on the specifics of the task, often lack principled objectives, and
-- crucially -- modify individual policies, which might be sub-optimal when the
domains differ due to a drift in the state space, i.e., it is intrinsic to the
environment and therefore affects every agent interacting with it. To address
these drawbacks, we propose TvD: transfer via distribution matching, a
framework to transfer knowledge across interactive domains. We approach the
problem from a data-centric perspective, characterizing the discrepancy in
environments by means of (potentially complex) transformation between their
state spaces, and thus posing the problem of transfer as learning to undo this
transformation. To accomplish this, we introduce a novel optimization objective
based on an optimal transport distance between two distributions over
trajectories -- those generated by an already-learned policy in the source
domain and a learnable pushforward policy in the target domain. We show this
objective leads to a policy update scheme reminiscent of imitation learning,
and derive an efficient algorithm to implement it. Our experiments in simple
gridworlds show that this method yields successful transfer learning across a
wide range of environment transformations.
Related papers
- xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing [21.37585797507323]
Cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning.
We propose the Cross-Domain Trajectory EDiting framework that employs a specially designed diffusion model for cross-domain trajectory adaptation.
Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data.
arXiv Detail & Related papers (2024-09-13T10:07:28Z) - Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL)
In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain.
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z) - Cross Domain Policy Transfer with Effect Cycle-Consistency [3.3213136251955815]
Training a robotic policy from scratch using deep reinforcement learning methods can be prohibitively expensive due to sample inefficiency.
We propose a novel approach for learning the mapping functions between state and action spaces across domains using unpaired data.
Our approach has been tested on three locomotion tasks and two robotic manipulation tasks.
arXiv Detail & Related papers (2024-03-04T13:20:07Z) - A Framework for Few-Shot Policy Transfer through Observation Mapping and
Behavior Cloning [6.048526012097133]
This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning.
We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain.
arXiv Detail & Related papers (2023-10-13T03:15:42Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring
Network [58.05473757538834]
This paper proposes a novel adversarial scoring network (ASNet) to bridge the gap across domains from coarse to fine granularity.
Three sets of migration experiments show that the proposed methods achieve state-of-the-art counting performance.
arXiv Detail & Related papers (2021-07-27T14:47:24Z) - Physically-Constrained Transfer Learning through Shared Abundance Space
for Hyperspectral Image Classification [14.840925517957258]
We propose a new transfer learning scheme to bridge the gap between the source and target domains.
The proposed method is referred to as physically-constrained transfer learning through shared abundance space.
arXiv Detail & Related papers (2020-08-19T17:41:37Z) - An Imitation from Observation Approach to Transfer Learning with
Dynamics Mismatch [44.898655782896306]
We show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation.
We derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques.
We find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods.
arXiv Detail & Related papers (2020-08-04T14:36:02Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z) - Unsupervised Transfer Learning with Self-Supervised Remedy [60.315835711438936]
Generalising deep networks to novel domains without manual labels is challenging to deep learning.
Pre-learned knowledge does not transfer well without making strong assumptions about the learned and the novel domains.
In this work, we aim to learn a discriminative latent space of the unlabelled target data in a novel domain by knowledge transfer from labelled related domains.
arXiv Detail & Related papers (2020-06-08T16:42:17Z) - Continuous Domain Adaptation with Variational Domain-Agnostic Feature
Replay [78.7472257594881]
Learning in non-stationary environments is one of the biggest challenges in machine learning.
Non-stationarity can be caused by either task drift, or the domain drift.
We propose variational domain-agnostic feature replay, an approach that is composed of three components.
arXiv Detail & Related papers (2020-03-09T19:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.