Lifting Weak Supervision To Structured Prediction
- URL: http://arxiv.org/abs/2211.13375v1
- Date: Thu, 24 Nov 2022 02:02:58 GMT
- Title: Lifting Weak Supervision To Structured Prediction
- Authors: Harit Vishwakarma, Nicholas Roberts, Frederic Sala
- Abstract summary: Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates.
We introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions.
Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest.
- Score: 12.219011764895853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weak supervision (WS) is a rich set of techniques that produce pseudolabels
by aggregating easily obtained but potentially noisy label estimates from a
variety of sources. WS is theoretically well understood for binary
classification, where simple approaches enable consistent estimation of
pseudolabel noise rates. Using this result, it has been shown that downstream
models trained on the pseudolabels have generalization guarantees nearly
identical to those trained on clean labels. While this is exciting, users often
wish to use WS for structured prediction, where the output space consists of
more than a binary or multi-class label set: e.g. rankings, graphs, manifolds,
and more. Do the favorable theoretical properties of WS for binary
classification lift to this setting? We answer this question in the affirmative
for a wide range of scenarios. For labels taking values in a finite metric
space, we introduce techniques new to weak supervision based on
pseudo-Euclidean embeddings and tensor decompositions, providing a
nearly-consistent noise rate estimator. For labels in constant-curvature
Riemannian manifolds, we introduce new invariants that also yield consistent
noise rate estimation. In both cases, when using the resulting pseudolabels in
concert with a flexible downstream model, we obtain generalization guarantees
nearly identical to those for models trained on clean data. Several of our
results, which can be viewed as robustness guarantees in structured prediction
with noisy labels, may be of independent interest. Empirical evaluation
validates our claims and shows the merits of the proposed method.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Data-Driven Estimation of the False Positive Rate of the Bayes Binary
Classifier via Soft Labels [25.40796153743837]
We propose an estimator for the false positive rate (FPR) of the Bayes classifier, that is, the optimal classifier with respect to accuracy, from a given dataset.
We develop effective FPR estimators by leveraging a denoising technique and the Nadaraya-Watson estimator.
arXiv Detail & Related papers (2024-01-27T20:41:55Z) - Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Robust Point Cloud Segmentation with Noisy Annotations [32.991219357321334]
Class labels are often mislabeled at both instance-level and boundary-level in real-world datasets.
We take the lead in solving the instance-level label noise by proposing a Point Noise-Adaptive Learning framework.
Our framework significantly outperforms its baselines, and is comparable to the upper bound trained on completely clean data.
arXiv Detail & Related papers (2022-12-06T18:59:58Z) - Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised
Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations.
Recent advances accomplish this task by leveraging clustering-based pseudo labels.
We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Error-Bounded Correction of Noisy Labels [17.510654621245656]
We show that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean.
Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction.
We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.
arXiv Detail & Related papers (2020-11-19T19:23:23Z) - GANs for learning from very high class conditional noisy labels [1.6516902135723865]
We use Generative Adversarial Networks (GANs) to design a class conditional label noise (CCN) robust scheme for binary classification.
It first generates a set of correctly labelled data points from noisy labelled data and 0.1% or 1% clean labels.
arXiv Detail & Related papers (2020-10-19T15:01:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.