Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2602.21072v1
- Date: Tue, 24 Feb 2026 16:32:50 GMT
- Title: Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning
- Authors: Zhangjie Xia, Yu Yang, Pan Xu,
- Abstract summary: Off-dynamics offline reinforcement learning (RL) aims to learn a policy for a target domain using limited target data and abundant source data.<n>We propose Localized Dynamics-Aware Domain Adaptation (LoDADA), which exploits localized dynamics mismatch to better reuse source data.<n>Results show that LoDADA consistently outperforms state-of-the-art off-dynamics offline RL methods by better leveraging localized distribution mismatch.
- Score: 12.053247880343699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-dynamics offline reinforcement learning (RL) aims to learn a policy for a target domain using limited target data and abundant source data collected under different transition dynamics. Existing methods typically address dynamics mismatch either globally over the state space or via pointwise data filtering; these approaches can miss localized cross-domain similarities or incur high computational cost. We propose Localized Dynamics-Aware Domain Adaptation (LoDADA), which exploits localized dynamics mismatch to better reuse source data. LoDADA clusters transitions from source and target datasets and estimates cluster-level dynamics discrepancy via domain discrimination. Source transitions from clusters with small discrepancy are retained, while those from clusters with large discrepancy are filtered out. This yields a fine-grained and scalable data selection strategy that avoids overly coarse global assumptions and expensive per-sample filtering. We provide theoretical insights and extensive experiments across environments with diverse global and local dynamics shifts. Results show that LoDADA consistently outperforms state-of-the-art off-dynamics offline RL methods by better leveraging localized distribution mismatch.
Related papers
- Cross-Domain Offline Policy Adaptation via Selective Transition Correction [29.251685312287155]
It remains a critical challenge to adapt policies across domains with mismatched dynamics in reinforcement learning (RL)<n>We study cross-domain offline RL, where an offline dataset from another similar source domain can be accessed to enhance policy learning upon a target domain dataset.<n>We propose the Selective Transition Correction (STC) algorithm, which enables reliable usage of source domain data for policy adaptation.
arXiv Detail & Related papers (2026-02-05T15:37:29Z) - Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering [71.07094489719034]
Cross-Domain Offline Reinforcement Learning aims to train an agent deployed in the target environment.<n>Recent advances address this issue by selectively sharing source domain samples that exhibit dynamics alignment with the target domain.<n>These approaches focus solely on dynamics alignment and overlook textitvalue alignment, i.e., selecting high-quality, high-value samples from the source domain.<n>We present our textbfunderlineDynamics- and textbfunderlineValue-aligned textbfunderlineData
arXiv Detail & Related papers (2025-12-02T05:45:40Z) - DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning [11.290019540058625]
Cross-domain offline reinforcement learning (RL) seeks to enhance sample efficiency by utilizing additional offline source datasets.<n>DmC is a novel framework for cross-domain offline RL with limited target samples.
arXiv Detail & Related papers (2025-07-28T03:34:15Z) - MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning [25.497449531415125]
We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets.<n>We propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimize a policy using learned target dynamics transitions.<n>We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines.
arXiv Detail & Related papers (2025-06-10T05:36:54Z) - DIDS: Domain Impact-aware Data Sampling for Large Language Model Training [61.10643823069603]
We present Domain Impact-aware Data Sampling (DIDS) for large language models.<n>DIDS group training data based on learning effects, where a proxy language model and dimensionality reduction are employed.<n>It achieves 3.4% higher average performance while maintaining comparable training efficiency.
arXiv Detail & Related papers (2025-04-17T13:09:38Z) - ODRL: A Benchmark for Off-Dynamics Reinforcement Learning [59.72217833812439]
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods.
ODRL contains four experimental settings where the source and target domains can be either online or offline.
We conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts.
arXiv Detail & Related papers (2024-10-28T05:29:38Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Instance Relation Graph Guided Source-Free Domain Adaptive Object
Detection [79.89082006155135]
Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift.
UDA methods try to align the source and target representations to improve the generalization on the target domain.
The Source-Free Adaptation Domain (SFDA) setting aims to alleviate these concerns by adapting a source-trained model for the target domain without requiring access to the source data.
arXiv Detail & Related papers (2022-03-29T17:50:43Z) - Navigating the Kaleidoscope of COVID-19 Misinformation Using Deep
Learning [0.76146285961466]
We propose an effective model to capture both the local and global context of the target domain.
We show that: (i) the deep Transformer-based pre-trained models, utilized via the mixed-domain transfer learning, are only good at capturing the local context, thus exhibits poor generalization.
A combination of shallow network-based domain-specific models and convolutional neural networks can efficiently extract local as well as global context directly from the target data in a hierarchical fashion, enabling it to offer a more generalizable solution.
arXiv Detail & Related papers (2021-09-19T15:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.