Robust Online Classification: From Estimation to Denoising
- URL: http://arxiv.org/abs/2309.01698v1
- Date: Mon, 4 Sep 2023 16:17:39 GMT
- Title: Robust Online Classification: From Estimation to Denoising
- Authors: Changlong Wu, Ananth Grama, Wojciech Szpankowski
- Abstract summary: We study online classification in the presence of noisy labels.
We show that for a wide range of natural noise kernels, adversarially selected features, and finite class of labeling functions, minimax risk can be upper bounded independent of the time horizon.
- Score: 16.336539657286266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online classification in the presence of noisy labels. The noise
mechanism is modeled by a general kernel that specifies, for any feature-label
pair, a (known) set of distributions over noisy labels. At each time step, an
adversary selects an unknown distribution from the distribution set specified
by the kernel based on the actual feature-label pair, and generates the noisy
label from the selected distribution. The learner then makes a prediction based
on the actual features and noisy labels observed thus far, and incurs loss $1$
if the prediction differs from the underlying truth (and $0$ otherwise). The
prediction quality is quantified through minimax risk, which computes the
cumulative loss over a finite horizon $T$. We show that for a wide range of
natural noise kernels, adversarially selected features, and finite class of
labeling functions, minimax risk can be upper bounded independent of the time
horizon and logarithmic in the size of labeling function class. We then extend
these results to inifinite classes and stochastically generated features via
the concept of stochastic sequential covering. Our results extend and encompass
findings of Ben-David et al. (2009) through substantial generality, and provide
intuitive understanding through a novel reduction to online conditional
distribution estimation.
Related papers
- Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts [4.795811957412855]
Noise in data appears to be inevitable in most real-world machine learning applications.
We investigate the less explored area of noisy label learning for multilabel classifications.
Our model posits that label noise arises from a shift in the latent variable, providing a more robust and beneficial means for noisy learning.
arXiv Detail & Related papers (2025-02-20T05:41:52Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Rethinking Noisy Label Learning in Real-world Annotation Scenarios from
the Noise-type Perspective [38.24239397999152]
We propose a novel sample selection-based approach for noisy label learning, called Proto-semi.
Proto-semi divides all samples into the confident and unconfident datasets via warm-up.
By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics.
Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels.
arXiv Detail & Related papers (2023-07-28T10:57:38Z) - Handling Realistic Label Noise in BERT Text Classification [1.0515439489916731]
Real label noise is not random; rather, it is often correlated with input features or other annotator-specific factors.
We show that the presence of these types of noise significantly degrades BERT classification performance.
arXiv Detail & Related papers (2023-05-23T18:30:31Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Robustness to Label Noise Depends on the Shape of the Noise Distribution
in Feature Space [6.748225062396441]
We show that both the scale and the shape of the noise distribution influence the posterior likelihood.
We show that when the noise distribution targets decision boundaries, classification robustness can drop off even at a small scale of noise.
arXiv Detail & Related papers (2022-06-02T15:41:59Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Statistical Hypothesis Testing for Class-Conditional Label Noise [3.6895394817068357]
This work aims to provide machine learning practitioners with tools to answer the question: is there class-conditional flipping noise in my labels?
In particular, we present hypothesis tests to check whether a given dataset of instance-label pairs has been corrupted with class-conditional label noise.
arXiv Detail & Related papers (2021-03-03T19:03:06Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Confidence Scores Make Instance-dependent Label-noise Learning Possible [129.84497190791103]
In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model.
We introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score.
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
arXiv Detail & Related papers (2020-01-11T16:15:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.