Cross-Domain Offline Policy Adaptation via Selective Transition Correction
- URL: http://arxiv.org/abs/2602.05776v1
- Date: Thu, 05 Feb 2026 15:37:29 GMT
- Title: Cross-Domain Offline Policy Adaptation via Selective Transition Correction
- Authors: Mengbei Yan, Jiafei Lyu, Shengjie Sun, Zhongjian Qiao, Jingwen Yang, Zichuan Lin, Deheng Ye, Xiu Li,
- Abstract summary: It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL)<n>We study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset.<n>We propose the Selective Transition Correction (STC) algorithm, which enables reliable usage of source domain data for policy adaptation.
- Score: 29.251685312287155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL). In this paper, we study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset. Directly merging the two datasets may lead to suboptimal performance due to potential dynamics mismatches. Existing approaches typically mitigate this issue through source domain transition filtering or reward modification, which, however, may lead to insufficient exploitation of the valuable source domain data. Instead, we propose to modify the source domain data into the target domain data. To that end, we leverage an inverse policy model and a reward model to correct the actions and rewards of source transitions, explicitly achieving alignment with the target dynamics. Since limited data may result in inaccurate model training, we further employ a forward dynamics model to retain corrected samples that better match the target dynamics than the original transitions. Consequently, we propose the Selective Transition Correction (STC) algorithm, which enables reliable usage of source domain data for policy adaptation. Experiments on various environments with dynamics shifts demonstrate that STC achieves superior performance against existing baselines.
Related papers
- Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering [71.07094489719034]
Cross-Domain Offline Reinforcement Learning aims to train an agent deployed in the target environment.<n>Recent advances address this issue by selectively sharing source domain samples that exhibit dynamics alignment with the target domain.<n>These approaches focus solely on dynamics alignment and overlook textitvalue alignment, i.e., selecting high-quality, high-value samples from the source domain.<n>We present our textbfunderlineDynamics- and textbfunderlineValue-aligned textbfunderlineData
arXiv Detail & Related papers (2025-12-02T05:45:40Z) - Robust Source-Free Domain Adaptation for Medical Image Segmentation based on Curriculum Learning [54.514202147709625]
We propose a curriculum-based framework, namely learning from curriculum (LFC) for source-free domain adaptation.<n>We evaluate the proposed source-free domain adaptation approach on the public cross-domain datasets for fundus segmentation and polyp segmentation.
arXiv Detail & Related papers (2025-10-09T16:15:10Z) - MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning [25.497449531415125]
We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets.<n>We propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimize a policy using learned target dynamics transitions.<n>We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines.
arXiv Detail & Related papers (2025-06-10T05:36:54Z) - Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation [19.37193250533054]
We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain.
Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation.
arXiv Detail & Related papers (2024-11-15T02:35:20Z) - Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning [26.915055027485465]
We study offline off-dynamics reinforcement learning (RL) to enhance policy learning in a target domain with limited data.
Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT)
We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain.
arXiv Detail & Related papers (2024-10-30T20:46:26Z) - xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing [21.37585797507323]
Cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning.<n>We propose the Cross-Domain Trajectory EDiting framework that employs a specially designed diffusion model for cross-domain trajectory adaptation.<n>Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data.
arXiv Detail & Related papers (2024-09-13T10:07:28Z) - Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data.
Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Variational Model Perturbation for Source-Free Domain Adaptation [64.98560348412518]
We introduce perturbations into the model parameters by variational Bayesian inference in a probabilistic framework.
We demonstrate the theoretical connection to learning Bayesian neural networks, which proves the generalizability of the perturbed model to target domains.
arXiv Detail & Related papers (2022-10-19T08:41:19Z) - RAIN: RegulArization on Input and Network for Black-Box Domain
Adaptation [80.03883315743715]
Source-free domain adaptation transits the source-trained model towards target domain without exposing the source data.
This paradigm is still at risk of data leakage due to adversarial attacks on the source model.
We propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization.
arXiv Detail & Related papers (2022-08-22T18:18:47Z) - Instance Relation Graph Guided Source-Free Domain Adaptive Object
Detection [79.89082006155135]
Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift.
UDA methods try to align the source and target representations to improve the generalization on the target domain.
The Source-Free Adaptation Domain (SFDA) setting aims to alleviate these concerns by adapting a source-trained model for the target domain without requiring access to the source data.
arXiv Detail & Related papers (2022-03-29T17:50:43Z) - Source-Free Domain Adaptation for Semantic Segmentation [11.722728148523366]
Unsupervised Domain Adaptation (UDA) can tackle the challenge that convolutional neural network-based approaches for semantic segmentation heavily rely on the pixel-level annotated data.
We propose a source-free domain adaptation framework for semantic segmentation, namely SFDA, in which only a well-trained source model and an unlabeled target domain dataset are available for adaptation.
arXiv Detail & Related papers (2021-03-30T14:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.