How to Leverage Unlabeled Data in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2202.01741v1
- Date: Thu, 3 Feb 2022 18:04:54 GMT
- Title: How to Leverage Unlabeled Data in Offline Reinforcement Learning
- Authors: Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn,
Sergey Levine
- Abstract summary: offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.
One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.
We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
- Score: 125.72601809192365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) can learn control policies from static
datasets but, like standard RL methods, it requires reward annotations for
every transition. In many cases, labeling large datasets with rewards may be
costly, especially if those rewards must be provided by human labelers, while
collecting diverse unlabeled data might be comparatively inexpensive. How can
we best leverage such unlabeled data in offline RL? One natural solution is to
learn a reward function from the labeled data and use it to label the unlabeled
data. In this paper, we find that, perhaps surprisingly, a much simpler method
that simply applies zero rewards to unlabeled data leads to effective data
sharing both in theory and in practice, without learning any reward model at
all. While this approach might seem strange (and incorrect) at first, we
provide extensive theoretical and empirical analysis that illustrates how it
trades off reward bias, sample complexity and distributional shift, often
leading to good results. We characterize conditions under which this simple
strategy is effective, and further show that extending it with a simple
reweighting approach can further alleviate the bias introduced by using
incorrect reward labels. Our empirical evaluation confirms these findings in
simulated robotic locomotion, navigation, and manipulation settings.
Related papers
- Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning [3.8552182839941898]
offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data.
In this paper, we present the algorithm to utilize the unlabeled data in the offline RL method with kernel function approximation.
arXiv Detail & Related papers (2024-08-22T11:31:51Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Boosting Semi-Supervised Learning with Contrastive Complementary
Labeling [11.851898765002334]
A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions.
We highlight that data with low-confidence pseudo labels can be still beneficial to the training process.
Inspired by this, we propose a novel Contrastive Complementary Labeling (CCL) method that constructs a large number of reliable negative pairs.
arXiv Detail & Related papers (2022-12-13T15:25:49Z) - Weighted Distillation with Unlabeled Examples [15.825078347452024]
Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited.
This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm.
arXiv Detail & Related papers (2022-10-13T04:08:56Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Active learning for online training in imbalanced data streams under
cold start [0.8155575318208631]
We propose an Active Learning (AL) annotation system for datasets with orders of magnitude of class imbalance.
We present a computationally efficient Outlier-based Discriminative AL approach (ODAL) and design a novel 3-stage sequence of AL labeling policies.
The results show that our method can more quickly reach a high performance model than standard AL policies.
arXiv Detail & Related papers (2021-07-16T06:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.