Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning
- URL: http://arxiv.org/abs/2602.23737v1
- Date: Fri, 27 Feb 2026 07:04:22 GMT
- Title: Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning
- Authors: Hanping Zhang, Yuhong Guo,
- Abstract summary: Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains.<n>A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning.<n>We propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL) to align source transitions with target-domain dynamics encoded in offline demonstrations.
- Score: 23.628360655654507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning. To address this challenge, we propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL), a novel framework that leverages Diffusion Schrödinger Bridge (DSB) to align source transitions with target-domain dynamics encoded in offline demonstrations. Moreover, we introduce a reward modulation mechanism that estimates rewards based on state transitions, applying to DSB-aligned samples to ensure consistency between rewards and target-domain dynamics. BDGxRL performs target-oriented policy learning entirely within the source domain, without access to the target environment or its rewards. Experiments on MuJoCo cross-domain benchmarks demonstrate that BDGxRL outperforms state-of-the-art baselines and shows strong adaptability under transition dynamics shifts.
Related papers
- Cross-Domain Offline Policy Adaptation via Selective Transition Correction [29.251685312287155]
It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL)<n>We study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset.<n>We propose the Selective Transition Correction (STC) algorithm, which enables reliable usage of source domain data for policy adaptation.
arXiv Detail & Related papers (2026-02-05T15:37:29Z) - Connecting Domains and Contrasting Samples: A Ladder for Domain Generalization [52.52838658375592]
We propose a new paradigm, domain-connecting contrastive learning (DCCL) to enhance conceptual connectivity across domains.<n>On the data side, more aggressive data augmentation and cross-domain positive samples are introduced to improve intra-class connectivity.<n>The results verify that DCCL outperforms state-of-the-art baselines even without domain supervision.
arXiv Detail & Related papers (2025-10-19T04:13:29Z) - In-Context Policy Adaptation via Cross-Domain Skill Diffusion [37.727612185480986]
In this work, we present an in-context policy adaptation framework designed for long-horizon multi-task environments.<n>The framework enables rapid adaptation of skill-based reinforcement learning policies to diverse target domains.<n>We show that our framework achieves superior policy adaptation performance under limited target domain data conditions.
arXiv Detail & Related papers (2025-09-04T06:55:38Z) - Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation [74.27130400558013]
This paper proposes a new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN)<n>CAN uses the domain-collaborative and domain-adversarial learning strategy for training the neural network.<n>To further enhance the discriminability in the target domain, we propose Self-Paced CAN (SPCAN)
arXiv Detail & Related papers (2025-06-24T02:58:37Z) - MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning [25.497449531415125]
We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets.<n>We propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimize a policy using learned target dynamics transitions.<n>We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines.
arXiv Detail & Related papers (2025-06-10T05:36:54Z) - Cross-Domain Diffusion with Progressive Alignment for Efficient Adaptive Retrieval [52.67656818203429]
Unsupervised efficient domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain.<n>Existing methods fail to address potential noise in the target domain, and directly align high-level features across domains.<n>We propose a novel Cross-Domain Diffusion with Progressive Alignment method (COUPLE) to address these challenges.
arXiv Detail & Related papers (2025-05-20T04:17:39Z) - Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation [19.37193250533054]
We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain.
Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation.
arXiv Detail & Related papers (2024-11-15T02:35:20Z) - xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing [21.37585797507323]
Cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning.<n>We propose the Cross-Domain Trajectory EDiting framework that employs a specially designed diffusion model for cross-domain trajectory adaptation.<n>Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data.
arXiv Detail & Related papers (2024-09-13T10:07:28Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Self-training through Classifier Disagreement for Cross-Domain Opinion
Target Extraction [62.41511766918932]
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining.
Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios.
We propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagrees on the unlabelled target data.
arXiv Detail & Related papers (2023-02-28T16:31:17Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.