Robust Online Classification: From Estimation to Denoising
- URL: http://arxiv.org/abs/2309.01698v1
- Date: Mon, 4 Sep 2023 16:17:39 GMT
- Title: Robust Online Classification: From Estimation to Denoising
- Authors: Changlong Wu, Ananth Grama, Wojciech Szpankowski
- Abstract summary: We study online classification in the presence of noisy labels.
We show that for a wide range of natural noise kernels, adversarially selected features, and finite class of labeling functions, minimax risk can be upper bounded independent of the time horizon.
- Score: 16.336539657286266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online classification in the presence of noisy labels. The noise
mechanism is modeled by a general kernel that specifies, for any feature-label
pair, a (known) set of distributions over noisy labels. At each time step, an
adversary selects an unknown distribution from the distribution set specified
by the kernel based on the actual feature-label pair, and generates the noisy
label from the selected distribution. The learner then makes a prediction based
on the actual features and noisy labels observed thus far, and incurs loss $1$
if the prediction differs from the underlying truth (and $0$ otherwise). The
prediction quality is quantified through minimax risk, which computes the
cumulative loss over a finite horizon $T$. We show that for a wide range of
natural noise kernels, adversarially selected features, and finite class of
labeling functions, minimax risk can be upper bounded independent of the time
horizon and logarithmic in the size of labeling function class. We then extend
these results to inifinite classes and stochastically generated features via
the concept of stochastic sequential covering. Our results extend and encompass
findings of Ben-David et al. (2009) through substantial generality, and provide
intuitive understanding through a novel reduction to online conditional
distribution estimation.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Lifting Weak Supervision To Structured Prediction [12.219011764895853]
Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates.
We introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions.
Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest.
arXiv Detail & Related papers (2022-11-24T02:02:58Z) - Tackling Instance-Dependent Label Noise with Dynamic Distribution
Calibration [18.59803726676361]
Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly.
It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models.
In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted.
arXiv Detail & Related papers (2022-10-11T03:50:52Z) - Robustness to Label Noise Depends on the Shape of the Noise Distribution
in Feature Space [6.748225062396441]
We show that both the scale and the shape of the noise distribution influence the posterior likelihood.
We show that when the noise distribution targets decision boundaries, classification robustness can drop off even at a small scale of noise.
arXiv Detail & Related papers (2022-06-02T15:41:59Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.