Related papers: Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

URL: http://arxiv.org/abs/2204.09371v1
Date: Wed, 20 Apr 2022 10:24:19 GMT
Title: Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Authors: Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow
Abstract summary: Wrong labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques are required to prevent models from fitting this label noise. We show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it.
Score: 23.554544399110508
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.

Related papers

Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning [58.052712054684946]
In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist. We propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data.
arXiv Detail & Related papers (2025-03-14T13:58:27Z)
Robust Testing for Deep Learning using Human Label Noise [5.9848836847249185]
In deep learning (DL) systems, label noise in training datasets often degrades model performance. Traditionally, these methods are tested using synthetic label noise, where ground truth labels are randomly flipped. We present Cluster-Based Noise (CBN), a method for generating feature-dependent noise that simulates human-like label noise.
arXiv Detail & Related papers (2024-11-29T20:31:57Z)
NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns. We constructed a benchmark dataset to better understand label noise in real-world text classification settings. Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z)
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise. We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z)
Noisy Label Processing for Classification: A Survey [2.8821062918162146]
In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images. It is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. We propose an algorithm to generate a synthetic label noise pattern guided by real-world data.
arXiv Detail & Related papers (2024-04-05T15:11:09Z)
Robust Tiny Object Detection in Aerial Images amidst Label Noise [50.257696872021164]
This study addresses the issue of tiny object detection under noisy label supervision. We propose a DeNoising Tiny Object Detector (DN-TOD), which incorporates a Class-aware Label Correction scheme. Our method can be seamlessly integrated into both one-stage and two-stage object detection pipelines.
arXiv Detail & Related papers (2024-01-16T02:14:33Z)
Denoising Enhanced Distantly Supervised Ultrafine Entity Typing [36.14308856513851]
We build a noise model to estimate the unknown labeling noise distribution over input contexts and noisy type labels. With the noise model, more trustworthy labels can be recovered by subtracting the estimated noise from the input. We propose an entity typing model, which adopts a bi-encoder architecture, is trained on the denoised data.
arXiv Detail & Related papers (2022-10-18T05:20:16Z)
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N) We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z)
Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms. By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z)
Training Classifiers that are Universally Robust to All Label Noise Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise. We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning. Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z)
Towards Robustness to Label Noise in Text Classification via Noise Modeling [7.863638253070439]
Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.
arXiv Detail & Related papers (2021-01-27T05:41:57Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks. We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.