Towards Robustness to Label Noise in Text Classification via Noise
Modeling
- URL: http://arxiv.org/abs/2101.11214v1
- Date: Wed, 27 Jan 2021 05:41:57 GMT
- Title: Towards Robustness to Label Noise in Text Classification via Noise
Modeling
- Authors: Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe
- Abstract summary: Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures.
We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.
- Score: 7.863638253070439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large datasets in NLP suffer from noisy labels, due to erroneous automatic
and human annotation procedures. We study the problem of text classification
with label noise, and aim to capture this noise through an auxiliary noise
model over the classifier. We first assign a probability score to each training
sample of having a noisy label, through a beta mixture model fitted on the
losses at an early epoch of training. Then, we use this score to selectively
guide the learning of the noise model and classifier. Our empirical evaluation
on two text classification tasks shows that our approach can improve over the
baseline accuracy, and prevent over-fitting to the noise.
Related papers
- NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Rethinking Noisy Label Learning in Real-world Annotation Scenarios from
the Noise-type Perspective [38.24239397999152]
We propose a novel sample selection-based approach for noisy label learning, called Proto-semi.
Proto-semi divides all samples into the confident and unconfident datasets via warm-up.
By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics.
Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels.
arXiv Detail & Related papers (2023-07-28T10:57:38Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in
Text Classification [23.554544399110508]
Wrong labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision.
It has been shown that complex noise-handling techniques are required to prevent models from fitting this label noise.
We show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it.
arXiv Detail & Related papers (2022-04-20T10:24:19Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Towards Noise-resistant Object Detection with Noisy Annotations [119.63458519946691]
Training deep object detectors requires significant amount of human-annotated images with accurate object labels and bounding box coordinates.
Noisy annotations are much more easily accessible, but they could be detrimental for learning.
We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.
arXiv Detail & Related papers (2020-03-03T01:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.