Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering
- URL: http://arxiv.org/abs/2512.02435v1
- Date: Tue, 02 Dec 2025 05:45:40 GMT
- Title: Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering
- Authors: Zhongjian Qiao, Rui Yang, Jiafei Lyu, Chenjia Bai, Xiu Li, Zhuoran Yang, Siyang Gao, Shuang Qiu,
- Abstract summary: Cross-Domain Offline Reinforcement Learning aims to train an agent deployed in the target environment.<n>Recent advances address this issue by selectively sharing source domain samples that exhibit dynamics alignment with the target domain.<n>These approaches focus solely on dynamics alignment and overlook textitvalue alignment, i.e., selecting high-quality, high-value samples from the source domain.<n>We present our textbfunderlineDynamics- and textbfunderlineValue-aligned textbfunderlineData
- Score: 71.07094489719034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-Domain Offline Reinforcement Learning aims to train an agent deployed in the target environment, leveraging both a limited target domain dataset and a source domain dataset with (possibly) sufficient data coverage. Due to the underlying dynamics misalignment between the source and target domain, simply merging the data from two datasets may incur inferior performance. Recent advances address this issue by selectively sharing source domain samples that exhibit dynamics alignment with the target domain. However, these approaches focus solely on dynamics alignment and overlook \textit{value alignment}, i.e., selecting high-quality, high-value samples from the source domain. In this paper, we first demonstrate that both dynamics alignment and value alignment are essential for policy learning, by examining the limitations of the current theoretical framework for cross-domain RL and establishing a concrete sub-optimality gap of a policy trained on the source domain and evaluated on the target domain. Motivated by the theoretical insights, we propose to selectively share those source domain samples with both high dynamics and value alignment and present our \textbf{\underline{D}}ynamics- and \textbf{\underline{V}}alue-aligned \textbf{\underline{D}}ata \textbf{\underline{F}}iltering (DVDF) method. We design a range of dynamics shift settings, including kinematic and morphology shifts, and evaluate DVDF on various tasks and datasets, as well as in challenging extremely low-data settings where the target domain dataset contains only 5,000 transitions. Extensive experiments demonstrate that DVDF consistently outperforms prior strong baselines and delivers exceptional performance across multiple tasks and datasets.
Related papers
- Cross-Domain Offline Policy Adaptation via Selective Transition Correction [29.251685312287155]
It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL)<n>We study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset.<n>We propose the Selective Transition Correction (STC) algorithm, which enables reliable usage of source domain data for policy adaptation.
arXiv Detail & Related papers (2026-02-05T15:37:29Z) - DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning [11.290019540058625]
Cross-domain offline reinforcement learning (RL) seeks to enhance sample efficiency by utilizing additional offline source datasets.<n>DmC is a novel framework for cross-domain offline RL with limited target samples.
arXiv Detail & Related papers (2025-07-28T03:34:15Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning [46.08671291758573]
Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain.
Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains.
We propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.
arXiv Detail & Related papers (2024-05-10T02:21:42Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Self-training through Classifier Disagreement for Cross-Domain Opinion
Target Extraction [62.41511766918932]
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining.
Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios.
We propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagrees on the unlabelled target data.
arXiv Detail & Related papers (2023-02-28T16:31:17Z) - Instance Relation Graph Guided Source-Free Domain Adaptive Object
Detection [79.89082006155135]
Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift.
UDA methods try to align the source and target representations to improve the generalization on the target domain.
The Source-Free Adaptation Domain (SFDA) setting aims to alleviate these concerns by adapting a source-trained model for the target domain without requiring access to the source data.
arXiv Detail & Related papers (2022-03-29T17:50:43Z) - Dynamic Feature Alignment for Semi-supervised Domain Adaptation [23.67093835143]
We propose to use dynamic feature alignment to address both inter- and intra-domain discrepancy.
Our approach, which doesn't require extensive tuning or adversarial training, significantly improves the state of the art for semi-supervised domain adaptation.
arXiv Detail & Related papers (2021-10-18T22:26:27Z) - Discriminative Cross-Domain Feature Learning for Partial Domain
Adaptation [70.45936509510528]
Partial domain adaptation aims to adapt knowledge from a larger and more diverse source domain to a smaller target domain with less number of classes.
Recent practice on domain adaptation manages to extract effective features by incorporating the pseudo labels for the target domain.
It is essential to align target data with only a small set of source data.
arXiv Detail & Related papers (2020-08-26T03:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.