Related papers: How to Leverage Unlabeled Data in Offline Reinforcement Learning

How to Leverage Unlabeled Data in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2202.01741v1
Date: Thu, 3 Feb 2022 18:04:54 GMT
Title: How to Leverage Unlabeled Data in Offline Reinforcement Learning
Authors: Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn, Sergey Levine
Abstract summary: offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
Score: 125.72601809192365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. In many cases, labeling large datasets with rewards may be costly, especially if those rewards must be provided by human labelers, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. While this approach might seem strange (and incorrect) at first, we provide extensive theoretical and empirical analysis that illustrates how it trades off reward bias, sample complexity and distributional shift, often leading to good results. We characterize conditions under which this simple strategy is effective, and further show that extending it with a simple reweighting approach can further alleviate the bias introduced by using incorrect reward labels. Our empirical evaluation confirms these findings in simulated robotic locomotion, navigation, and manipulation settings.

Related papers

Semi-pessimistic Reinforcement Learning [14.86779635383123]
We propose a semi-pessimistic RL method to leverage abundant unlabeled data.<n>It considerably simplifies the learning process, as it seeks a lower bound of the reward function.<n>It enjoys the guaranteed improvement when utilizing vast unlabeled data, but requires much less restrictive conditions.
arXiv Detail & Related papers (2025-05-25T06:47:36Z)
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning [3.8552182839941898]
offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data. In this paper, we present the algorithm to utilize the unlabeled data in the offline RL method with kernel function approximation.
arXiv Detail & Related papers (2024-08-22T11:31:51Z)
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. Most SSL methods are commonly based on instance-wise consistency between different data transformations. We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Boosting Semi-Supervised Learning with Contrastive Complementary Labeling [11.851898765002334]
A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions. We highlight that data with low-confidence pseudo labels can be still beneficial to the training process. Inspired by this, we propose a novel Contrastive Complementary Labeling (CCL) method that constructs a large number of reliable negative pairs.
arXiv Detail & Related papers (2022-12-13T15:25:49Z)
Weighted Distillation with Unlabeled Examples [15.825078347452024]
Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited. This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm.
arXiv Detail & Related papers (2022-10-13T04:08:56Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL) CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z)
Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK) MAK follows three simple principles: tailness, proximity, and diversity. We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z)
Active learning for online training in imbalanced data streams under cold start [0.8155575318208631]
We propose an Active Learning (AL) annotation system for datasets with orders of magnitude of class imbalance. We present a computationally efficient Outlier-based Discriminative AL approach (ODAL) and design a novel 3-stage sequence of AL labeling policies. The results show that our method can more quickly reach a high performance model than standard AL policies.
arXiv Detail & Related papers (2021-07-16T06:49:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.