Offline Reinforcement Learning with Domain-Unlabeled Data
- URL: http://arxiv.org/abs/2404.07465v2
- Date: Sat, 01 Mar 2025 00:09:19 GMT
- Title: Offline Reinforcement Learning with Domain-Unlabeled Data
- Authors: Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama,
- Abstract summary: In robotics, precise system identification may only have been performed for part of the deployments.<n>We consider Positive-Unlabeled Offline RL (PUORL), a novel offline RL setting in which we have a small amount of labeled target-domain data.<n>Our approach accurately identifies target-domain samples and achieves high performance, even under substantial dynamics shift.
- Score: 45.65330937504355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) is vital in areas where active data collection is expensive or infeasible, such as robotics or healthcare. In the real world, offline datasets often involve multiple domains that share the same state and action spaces but have distinct dynamics, and only a small fraction of samples are clearly labeled as belonging to the target domain we are interested in. For example, in robotics, precise system identification may only have been performed for part of the deployments. To address this challenge, we consider Positive-Unlabeled Offline RL (PUORL), a novel offline RL setting in which we have a small amount of labeled target-domain data and a large amount of domain-unlabeled data from multiple domains, including the target domain. For PUORL, we propose a plug-and-play approach that leverages positive-unlabeled (PU) learning to train a domain classifier. The classifier then extracts target-domain samples from the domain-unlabeled data, augmenting the scarce target-domain data. Empirical results on a modified version of the D4RL benchmark demonstrate the effectiveness of our method: even when only 1 to 3 percent of the dataset is domain-labeled, our approach accurately identifies target-domain samples and achieves high performance, even under substantial dynamics shift. Our plug-and-play algorithm seamlessly integrates PU learning with existing offline RL pipelines, enabling effective multi-domain data utilization in scenarios where comprehensive domain labeling is prohibitive.
Related papers
- LE-UDA: Label-efficient unsupervised domain adaptation for medical image
segmentation [24.655779957716558]
We propose a novel and generic framework called Label-Efficient Unsupervised Domain Adaptation"(LE-UDA)
In LE-UDA, we construct self-ensembling consistency for knowledge transfer between both domains, as well as a self-ensembling adversarial learning module to achieve better feature alignment for UDA.
Experimental results demonstrate that the proposed LE-UDA can efficiently leverage limited source labels to improve cross-domain segmentation performance, outperforming state-of-the-art UDA approaches in the literature.
arXiv Detail & Related papers (2022-12-05T07:47:35Z) - CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal
Assignment and Pseudo-Label Refinement [84.10513481953583]
unsupervised domain adaptation (UDA) focuses on the selection of good pseudo-labels as surrogates for the missing labels in the target data.
source domain bias that deteriorates the pseudo-labels can still exist since the shared network of the source and target domains are typically used for the pseudo-label selections.
We propose CA-UDA to improve the quality of the pseudo-labels and UDA results with optimal assignment, a pseudo-label refinement strategy and class-aware domain alignment.
arXiv Detail & Related papers (2022-05-26T18:45:04Z) - Dynamic Instance Domain Adaptation [109.53575039217094]
Most studies on unsupervised domain adaptation assume that each domain's training samples come with domain labels.
We develop a dynamic neural network with adaptive convolutional kernels to generate instance-adaptive residuals to adapt domain-agnostic deep features to each individual instance.
Our model, dubbed DIDA-Net, achieves state-of-the-art performance on several commonly used single-source and multi-source UDA datasets.
arXiv Detail & Related papers (2022-03-09T20:05:54Z) - Positive-Unlabeled Domain Adaptation [7.143879014059893]
We present a novel two-step learning approach to the problem of Positive-Unlabeled Domain Adaptation.
We identify reliable positive and negative pseudo-labels in the target domain guided by source domain labels and a positive-unlabeled risk estimator.
We validate our approach by running experiments on benchmark datasets for visual object recognition.
arXiv Detail & Related papers (2022-02-11T15:32:02Z) - Domain Adaptive Semantic Segmentation without Source Data [50.18389578589789]
We investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain.
We propose an effective framework for this challenging problem with two components: positive learning and negative learning.
Our framework can be easily implemented and incorporated with other methods to further enhance the performance.
arXiv Detail & Related papers (2021-10-13T04:12:27Z) - Cross-domain Contrastive Learning for Unsupervised Domain Adaptation [108.63914324182984]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
We build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
arXiv Detail & Related papers (2021-06-10T06:32:30Z) - Towards Unsupervised Domain Adaptation for Deep Face Recognition under
Privacy Constraints via Federated Learning [33.33475702665153]
We propose a novel unsupervised federated face recognition approach (FedFR)
FedFR improves the performance in the target domain by iteratively aggregating knowledge from the source domain through federated learning.
It protects data privacy by transferring models instead of raw data between domains.
arXiv Detail & Related papers (2021-05-17T04:24:25Z) - Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised
Pre-Training [67.71228426496013]
We show that using target domain data during pre-training leads to large performance improvements across a variety of setups.
We find that pre-training on multiple domains improves performance generalization on domains not seen during training.
arXiv Detail & Related papers (2021-04-02T12:53:15Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z) - Effective Label Propagation for Discriminative Semi-Supervised Domain
Adaptation [76.41664929948607]
Semi-supervised domain adaptation (SSDA) methods have demonstrated great potential in large-scale image classification tasks.
We present a novel and effective method to tackle this problem by using effective inter-domain and intra-domain semantic information propagation.
Our source code and pre-trained models will be released soon.
arXiv Detail & Related papers (2020-12-04T14:28:19Z) - Discriminative Cross-Domain Feature Learning for Partial Domain
Adaptation [70.45936509510528]
Partial domain adaptation aims to adapt knowledge from a larger and more diverse source domain to a smaller target domain with less number of classes.
Recent practice on domain adaptation manages to extract effective features by incorporating the pseudo labels for the target domain.
It is essential to align target data with only a small set of source data.
arXiv Detail & Related papers (2020-08-26T03:18:53Z) - Clarinet: A One-step Approach Towards Budget-friendly Unsupervised
Domain Adaptation [39.53192710720228]
In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain.
We consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA.
The complementary label adversarial network (CLARINET) is proposed to solve the BFUDA problem.
arXiv Detail & Related papers (2020-07-29T05:31:58Z) - Cross-domain Self-supervised Learning for Domain Adaptation with Few
Source Labels [78.95901454696158]
We propose a novel Cross-Domain Self-supervised learning approach for domain adaptation.
Our method significantly boosts performance of target accuracy in the new target domain with few source labels.
arXiv Detail & Related papers (2020-03-18T15:11:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.