Uncertainty-aware Long-tailed Weights Model the Utility of Pseudo-labels for Semi-supervised Learning
- URL: http://arxiv.org/abs/2503.09974v1
- Date: Thu, 13 Mar 2025 02:21:04 GMT
- Title: Uncertainty-aware Long-tailed Weights Model the Utility of Pseudo-labels for Semi-supervised Learning
- Authors: Jiaqi Wu, Junbiao Pang, Qingming Huang,
- Abstract summary: We propose an Uncertainty-aware Ensemble Structure (UES) to assess the utility of pseudo-labels for unlabeled samples.<n>UES is lightweight and architecture-agnostic, easily extending to various computer vision tasks, including classification and regression.
- Score: 50.868594148443215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current Semi-supervised Learning (SSL) adopts the pseudo-labeling strategy and further filters pseudo-labels based on confidence thresholds. However, this mechanism has notable drawbacks: 1) setting the reasonable threshold is an open problem which significantly influences the selection of the high-quality pseudo-labels; and 2) deep models often exhibit the over-confidence phenomenon which makes the confidence value an unreliable indicator for assessing the quality of pseudo-labels due to the scarcity of labeled data. In this paper, we propose an Uncertainty-aware Ensemble Structure (UES) to assess the utility of pseudo-labels for unlabeled samples. We further model the utility of pseudo-labels as long-tailed weights to avoid the open problem of setting the threshold. Concretely, the advantage of the long-tailed weights ensures that even unreliable pseudo-labels still contribute to enhancing the model's robustness. Besides, UES is lightweight and architecture-agnostic, easily extending to various computer vision tasks, including classification and regression. Experimental results demonstrate that combining the proposed method with DualPose leads to a 3.47% improvement in Percentage of Correct Keypoints (PCK) on the Sniffing dataset with 100 data points (30 labeled), a 7.29\% improvement in PCK on the FLIC dataset with 100 data points (50 labeled), and a 3.91% improvement in PCK on the LSP dataset with 200 data points (100 labeled). Furthermore, when combined with FixMatch, the proposed method achieves a 0.2% accuracy improvement on the CIFAR-10 dataset with 40 labeled data points and a 0.26% accuracy improvement on the CIFAR-100 dataset with 400 labeled data points.
Related papers
- Improving the accuracy of automated labeling of specimen images datasets via a confidence-based process [9.0255922670433]
We present and validate an approach that can greatly improve automatic labeling accuracy.
We demonstrate that a naive model that produced 86% initial accuracy can achieve improved performance.
After validating the approach in a number of ways, we annotate a large dataset of over 600,000 herbarium specimens.
arXiv Detail & Related papers (2024-11-15T09:39:12Z) - Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning [50.868594148443215]
In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance.
We propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks.
arXiv Detail & Related papers (2024-08-08T01:31:38Z) - Pearls from Pebbles: Improved Confidence Functions for Auto-labeling [51.44986105969375]
threshold-based auto-labeling (TBAL) works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points.
We propose a framework for studying the emphoptimal TBAL confidence function.
We develop a new post-hoc method specifically designed to maximize performance in TBAL systems.
arXiv Detail & Related papers (2024-04-24T20:22:48Z) - Navigating Data Heterogeneity in Federated Learning A Semi-Supervised
Federated Object Detection [3.7398615061365206]
Federated Learning (FL) has emerged as a potent framework for training models across distributed data sources.
It faces challenges with limited high-quality labels and non-IID client data, particularly in applications like autonomous driving.
We present a pioneering SSFOD framework, designed for scenarios where labeled data reside only at the server while clients possess unlabeled data.
arXiv Detail & Related papers (2023-10-26T01:40:28Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Combating noisy labels in object detection datasets [0.0]
We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets.
We identify missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections.
The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1.
arXiv Detail & Related papers (2022-11-25T10:05:06Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.