USE: Uncertainty Structure Estimation for Robust Semi-Supervised Learning
- URL: http://arxiv.org/abs/2603.00404v1
- Date: Sat, 28 Feb 2026 01:31:46 GMT
- Title: USE: Uncertainty Structure Estimation for Robust Semi-Supervised Learning
- Authors: Tsao-Lun Chen, Chien-Liang Liu, Tzu-Ming Harry Hsu, Tai-Hsien Wu, Chi-Cheng Fu, Han-Yi E. Chou, Shun-Feng Su,
- Abstract summary: Uncertainty Structure Estimation (USE) is introduced for Semi-supervised learning (SSL)<n>USE trains a proxy model on the labeled set to compute entropy scores for unlabeled samples.<n>It is evident that USE consistently improves accuracy and under varying levels of OOD contamination.
- Score: 6.021112703667433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, a novel idea, Uncertainty Structure Estimation (USE), a lightweight, algorithm-agnostic procedure that emphasizes the often-overlooked role of unlabeled data quality is introduced for Semi-supervised learning (SSL). SSL has achieved impressive progress, but its reliability in deployment is limited by the quality of the unlabeled pool. In practice, unlabeled data are almost always contaminated by out-of-distribution (OOD) samples, where both near-OOD and far-OOD can negatively affect performance in different ways. We argue that the bottleneck does not lie in algorithmic design, but rather in the absence of principled mechanisms to assess and curate the quality of unlabeled data. The proposed USE trains a proxy model on the labeled set to compute entropy scores for unlabeled samples, and then derives a threshold, via statistical comparison against a reference distribution, that separates informative (structured) from uninformative (structureless) samples. This enables assessment as a preprocessing step, removing uninformative or harmful unlabeled data before SSL training begins. Through extensive experiments on imaging (CIFAR-100) and NLP (Yelp Review) data, it is evident that USE consistently improves accuracy and robustness under varying levels of OOD contamination. Thus, it can be concluded that the proposed approach reframes unlabeled data quality control as a structural assessment problem, and considers it as a necessary component for reliable and efficient SSL in realistic mixed-distribution environments.
Related papers
- Let the Void Be Void: Robust Open-Set Semi-Supervised Learning via Selective Non-Alignment [1.8350044465969415]
Open-set semi-supervised learning (OSSL) leverages unlabeled data containing both in-distribution (ID) and unknown out-of-distribution (OOD) samples.<n>We introduce selective non-alignment, adding a novel "skip" operator into conventional pull and push operations.<n>Our framework, SkipAlign, selectively skips alignment (pulling) for low-confidence unlabeled samples, retaining only gentle repulsion against ID prototypes.
arXiv Detail & Related papers (2025-04-17T01:37:53Z) - Uncertainty-Aware Out-of-Distribution Detection with Gaussian Processes [13.246251147975192]
Deep neural networks (DNNs) are often constructed under the closed-world assumption.<n>OOD samples are not always available during the training phase in real-world applications.<n>We propose a Gaussian-process-based OOD detection method to establish a decision boundary based on InD data only.
arXiv Detail & Related papers (2024-12-30T12:57:31Z) - How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z) - Complementing Semi-Supervised Learning with Uncertainty Quantification [6.612035830987296]
We propose a novel unsupervised uncertainty-aware objective that relies on aleatoric and epistemic uncertainty quantification.
Our results outperform the state-of-the-art results on complex datasets such as CIFAR-100 and Mini-ImageNet.
arXiv Detail & Related papers (2022-07-22T00:15:02Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z) - Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data.
In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier.
In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Know Your Limits: Uncertainty Estimation with ReLU Classifiers Fails at
Reliable OOD Detection [0.0]
This paper gives a theoretical explanation for said experimental findings and illustrates it on synthetic data.
We prove that such techniques are not able to reliably identify OOD samples in a classification setting.
arXiv Detail & Related papers (2020-12-09T21:35:55Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z) - Uncertainty-Based Out-of-Distribution Classification in Deep
Reinforcement Learning [17.10036674236381]
Wrong predictions for out-of-distribution data can cause safety critical situations in machine learning systems.
We propose a framework for uncertainty-based OOD classification: UBOOD.
We show that UBOOD produces reliable classification results when combined with ensemble-based estimators.
arXiv Detail & Related papers (2019-12-31T09:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.