Cross-Domain Policy Adaptation via Value-Guided Data Filtering
- URL: http://arxiv.org/abs/2305.17625v2
- Date: Fri, 13 Oct 2023 06:36:06 GMT
- Title: Cross-Domain Policy Adaptation via Value-Guided Data Filtering
- Authors: Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang,
Xuelong Li, Wei Li
- Abstract summary: Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
- Score: 57.62692881606099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalizing policies across different domains with dynamics mismatch poses a
significant challenge in reinforcement learning. For example, a robot learns
the policy in a simulator, but when it is deployed in the real world, the
dynamics of the environment may be different. Given the source and target
domain with dynamics mismatch, we consider the online dynamics adaptation
problem, in which case the agent can access sufficient source domain data while
online interactions with the target domain are limited. Existing research has
attempted to solve the problem from the dynamics discrepancy perspective. In
this work, we reveal the limitations of these methods and explore the problem
from the value difference perspective via a novel insight on the value
consistency across domains. Specifically, we present the Value-Guided Data
Filtering (VGDF) algorithm, which selectively shares transitions from the
source domain based on the proximity of paired value targets across the two
domains. Empirical results on various environments with kinematic and
morphology shifts demonstrate that our method achieves superior performance
compared to prior approaches.
Related papers
- xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing [21.37585797507323]
Cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning.
We propose the Cross-Domain Trajectory EDiting framework that employs a specially designed diffusion model for cross-domain trajectory adaptation.
Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data.
arXiv Detail & Related papers (2024-09-13T10:07:28Z) - Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL)
In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain.
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z) - Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning [46.08671291758573]
Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain.
Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains.
We propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.
arXiv Detail & Related papers (2024-05-10T02:21:42Z) - Revisiting the Domain Shift and Sample Uncertainty in Multi-source
Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.
This setting neglects the more practical scenario where training data are collected from multiple sources.
This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z) - Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation [3.367755441623275]
Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain.
We propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA)
This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains.
arXiv Detail & Related papers (2023-07-26T09:40:19Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - Gradient Regularized Contrastive Learning for Continual Domain
Adaptation [86.02012896014095]
We study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains.
We propose Gradient Regularized Contrastive Learning (GRCL) to solve the obstacles.
Experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach.
arXiv Detail & Related papers (2021-03-23T04:10:42Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z) - Multi-Source Domain Adaptation for Text Classification via
DistanceNet-Bandits [101.68525259222164]
We present a study of various distance-based measures in the context of NLP tasks, that characterize the dissimilarity between domains based on sample estimates.
We develop a DistanceNet model which uses these distance measures as an additional loss function to be minimized jointly with the task's loss function.
We extend this model to a novel DistanceNet-Bandit model, which employs a multi-armed bandit controller to dynamically switch between multiple source domains.
arXiv Detail & Related papers (2020-01-13T15:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.