Related papers: Lifting Weak Supervision To Structured Prediction

Lifting Weak Supervision To Structured Prediction

URL: http://arxiv.org/abs/2211.13375v1
Date: Thu, 24 Nov 2022 02:02:58 GMT
Title: Lifting Weak Supervision To Structured Prediction
Authors: Harit Vishwakarma, Nicholas Roberts, Frederic Sala
Abstract summary: Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates. We introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest.
Score: 12.219011764895853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method.

Related papers

Robust Partial-Label Learning by Leveraging Class Activation Values [0.0]
Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. We propose a novel method based on subjective logic, which explicitly represents uncertainty by leveraging the magnitudes of the underlying neural network's class activation values. We empirically show that our method yields more robust predictions in terms of predictive performance under high noise levels.
arXiv Detail & Related papers (2025-02-17T12:30:05Z)
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching. By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously. Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z)
Data-Driven Estimation of the False Positive Rate of the Bayes Binary Classifier via Soft Labels [25.40796153743837]
We propose an estimator for the false positive rate (FPR) of the Bayes classifier, that is, the optimal classifier with respect to accuracy, from a given dataset. We develop effective FPR estimators by leveraging a denoising technique and the Nadaraya-Watson estimator.
arXiv Detail & Related papers (2024-01-27T20:41:55Z)
Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. In nature, regression also requires unbiased methods to generate high-quality labels. We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z)
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications. In this paper, we reformulate the label-noise problem from a generative-model perspective. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z)
Robust Point Cloud Segmentation with Noisy Annotations [32.991219357321334]
Class labels are often mislabeled at both instance-level and boundary-level in real-world datasets. We take the lead in solving the instance-level label noise by proposing a Point Noise-Adaptive Learning framework. Our framework significantly outperforms its baselines, and is comparable to the upper bound trained on completely clean data.
arXiv Detail & Related papers (2022-12-06T18:59:58Z)
Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels. We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z)
Neighborhood Collective Estimation for Noisy Label Identification and Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels. Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias. We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
Error-Bounded Correction of Noisy Labels [17.510654621245656]
We show that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean. Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction. We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.
arXiv Detail & Related papers (2020-11-19T19:23:23Z)
GANs for learning from very high class conditional noisy labels [1.6516902135723865]
We use Generative Adversarial Networks (GANs) to design a class conditional label noise (CCN) robust scheme for binary classification. It first generates a set of correctly labelled data points from noisy labelled data and 0.1% or 1% clean labels.
arXiv Detail & Related papers (2020-10-19T15:01:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.