Related papers: Transfer RL via the Undo Maps Formalism

Transfer RL via the Undo Maps Formalism

URL: http://arxiv.org/abs/2211.14469v1
Date: Sat, 26 Nov 2022 03:44:28 GMT
Title: Transfer RL via the Undo Maps Formalism
Authors: Abhi Gupta, Ted Moskovitz, David Alvarez-Melis, Aldo Pacchiano
Abstract summary: Transferring knowledge across domains is one of the most fundamental problems in machine learning. We propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains. We show this objective leads to a policy update scheme reminiscent of imitation learning, and derive an efficient algorithm to implement it.
Score: 29.798971172941627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open problem. Current methods make strong assumptions on the specifics of the task, often lack principled objectives, and -- crucially -- modify individual policies, which might be sub-optimal when the domains differ due to a drift in the state space, i.e., it is intrinsic to the environment and therefore affects every agent interacting with it. To address these drawbacks, we propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains. We approach the problem from a data-centric perspective, characterizing the discrepancy in environments by means of (potentially complex) transformation between their state spaces, and thus posing the problem of transfer as learning to undo this transformation. To accomplish this, we introduce a novel optimization objective based on an optimal transport distance between two distributions over trajectories -- those generated by an already-learned policy in the source domain and a learnable pushforward policy in the target domain. We show this objective leads to a policy update scheme reminiscent of imitation learning, and derive an efficient algorithm to implement it. Our experiments in simple gridworlds show that this method yields successful transfer learning across a wide range of environment transformations.

Related papers

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing [21.37585797507323]
Cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning. We propose the Cross-Domain Trajectory EDiting framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data.
arXiv Detail & Related papers (2024-09-13T10:07:28Z)
Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL) In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain. We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z)
Cross Domain Policy Transfer with Effect Cycle-Consistency [3.3213136251955815]
Training a robotic policy from scratch using deep reinforcement learning methods can be prohibitively expensive due to sample inefficiency. We propose a novel approach for learning the mapping functions between state and action spaces across domains using unpaired data. Our approach has been tested on three locomotion tasks and two robotic manipulation tasks.
arXiv Detail & Related papers (2024-03-04T13:20:07Z)
A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning [6.048526012097133]
This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain.
arXiv Detail & Related papers (2023-10-13T03:15:42Z)
Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z)
Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network [58.05473757538834]
This paper proposes a novel adversarial scoring network (ASNet) to bridge the gap across domains from coarse to fine granularity. Three sets of migration experiments show that the proposed methods achieve state-of-the-art counting performance.
arXiv Detail & Related papers (2021-07-27T14:47:24Z)
Physically-Constrained Transfer Learning through Shared Abundance Space for Hyperspectral Image Classification [14.840925517957258]
We propose a new transfer learning scheme to bridge the gap between the source and target domains. The proposed method is referred to as physically-constrained transfer learning through shared abundance space.
arXiv Detail & Related papers (2020-08-19T17:41:37Z)
An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch [44.898655782896306]
We show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation. We derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods.
arXiv Detail & Related papers (2020-08-04T14:36:02Z)
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning. We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function. Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z)
Unsupervised Transfer Learning with Self-Supervised Remedy [60.315835711438936]
Generalising deep networks to novel domains without manual labels is challenging to deep learning. Pre-learned knowledge does not transfer well without making strong assumptions about the learned and the novel domains. In this work, we aim to learn a discriminative latent space of the unlabelled target data in a novel domain by knowledge transfer from labelled related domains.
arXiv Detail & Related papers (2020-06-08T16:42:17Z)
Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay [78.7472257594881]
Learning in non-stationary environments is one of the biggest challenges in machine learning. Non-stationarity can be caused by either task drift, or the domain drift. We propose variational domain-agnostic feature replay, an approach that is composed of three components.
arXiv Detail & Related papers (2020-03-09T19:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.