Analysing the Noise Model Error for Realistic Noisy Label Data
- URL: http://arxiv.org/abs/2101.09763v2
- Date: Mon, 1 Mar 2021 11:14:54 GMT
- Title: Analysing the Noise Model Error for Realistic Noisy Label Data
- Authors: Michael A. Hedderich, Dawei Zhu, Dietrich Klakow
- Abstract summary: We study the quality of estimated noise models from the theoretical side by deriving the expected error of the noise model.
We also publish NoisyNER, a new noisy label dataset from the NLP domain.
- Score: 14.766574408868806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distant and weak supervision allow to obtain large amounts of labeled
training data quickly and cheaply, but these automatic annotations tend to
contain a high amount of errors. A popular technique to overcome the negative
effects of these noisy labels is noise modelling where the underlying noise
process is modelled. In this work, we study the quality of these estimated
noise models from the theoretical side by deriving the expected error of the
noise model. Apart from evaluating the theoretical results on commonly used
synthetic noise, we also publish NoisyNER, a new noisy label dataset from the
NLP domain that was obtained through a realistic distant supervision technique.
It provides seven sets of labels with differing noise patterns to evaluate
different noise levels on the same instances. Parallel, clean labels are
available making it possible to study scenarios where a small amount of
gold-standard data can be leveraged. Our theoretical results and the
corresponding experiments give insights into the factors that influence the
noise model estimation like the noise distribution and the sampling technique.
Related papers
- NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time.
We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Denoising Enhanced Distantly Supervised Ultrafine Entity Typing [36.14308856513851]
We build a noise model to estimate the unknown labeling noise distribution over input contexts and noisy type labels.
With the noise model, more trustworthy labels can be recovered by subtracting the estimated noise from the input.
We propose an entity typing model, which adopts a bi-encoder architecture, is trained on the denoised data.
arXiv Detail & Related papers (2022-10-18T05:20:16Z) - Label noise detection under the Noise at Random model with ensemble
filters [5.994719700262245]
This work investigates the performance of ensemble noise detection under two different noise models.
We investigate the effect of class distribution on noise detection performance since it changes the total noise level observed in a dataset.
arXiv Detail & Related papers (2021-12-02T21:49:41Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.