Re-Examine Distantly Supervised NER: A New Benchmark and a Simple
Approach
- URL: http://arxiv.org/abs/2402.14948v2
- Date: Mon, 26 Feb 2024 14:59:58 GMT
- Title: Re-Examine Distantly Supervised NER: A New Benchmark and a Simple
Approach
- Authors: Yuepei Li, Kang Zhou, Qiao Qiao, Qing Wang and Qi Li
- Abstract summary: We critically assess the efficacy of current DS-NER methodologies using a real-world benchmark dataset named QTL.
To tackle the prevalent issue of label noise, we introduce a simple yet effective approach, Curriculum-based Positive-Unlabeled Learning CuPUL.
Our empirical results highlight the capability of CuPUL to significantly reduce the impact of noisy labels and outperform existing methods.
- Score: 15.87963432758696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper delves into Named Entity Recognition (NER) under the framework of
Distant Supervision (DS-NER), where the main challenge lies in the compromised
quality of labels due to inherent errors such as false positives, false
negatives, and positive type errors. We critically assess the efficacy of
current DS-NER methodologies using a real-world benchmark dataset named QTL,
revealing that their performance often does not meet expectations. To tackle
the prevalent issue of label noise, we introduce a simple yet effective
approach, Curriculum-based Positive-Unlabeled Learning CuPUL, which
strategically starts on "easy" and cleaner samples during the training process
to enhance model resilience to noisy samples. Our empirical results highlight
the capability of CuPUL to significantly reduce the impact of noisy labels and
outperform existing methods. QTL dataset and our code is available on GitHub.
Related papers
- Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Revisiting Class Imbalance for End-to-end Semi-Supervised Object
Detection [1.6249267147413524]
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
Many methods face challenges due to class imbalance, which hinders the effectiveness of the pseudo-label generator.
In this paper, we examine the root causes of low-quality pseudo-labels and present novel learning mechanisms to improve the label generation quality.
arXiv Detail & Related papers (2023-06-04T06:01:53Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Label Noise-Robust Learning using a Confidence-Based Sieving Strategy [15.997774467236352]
In learning tasks with label noise, improving model robustness against overfitting is a pivotal challenge.
Identifying the samples with noisy labels and preventing the model from learning them is a promising approach to address this challenge.
We propose a novel discriminator metric called confidence error and a sieving strategy called CONFES to differentiate between the clean and noisy samples effectively.
arXiv Detail & Related papers (2022-10-11T10:47:28Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning [10.014356492742074]
We propose to tackle the issues of imbalanced datasets and model calibration in a positive-unlabeled learning setting.
By boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set.
Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings.
arXiv Detail & Related papers (2022-01-31T12:55:47Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.