Detecting Label Errors in Token Classification Data
- URL: http://arxiv.org/abs/2210.03920v1
- Date: Sat, 8 Oct 2022 05:14:22 GMT
- Title: Detecting Label Errors in Token Classification Data
- Authors: Wei-Chen Wang, Jonas Mueller
- Abstract summary: We consider the task of finding sentences that contain label errors in token classification datasets.
We study 11 different straightforward methods that score tokens/sentences based on the predicted class probabilities.
We identify a simple and effective method that consistently detects those sentences containing label errors when applied with different token classification models.
- Score: 22.539748563923123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mislabeled examples are a common issue in real-world data, particularly for
tasks like token classification where many labels must be chosen on a
fine-grained basis. Here we consider the task of finding sentences that contain
label errors in token classification datasets. We study 11 different
straightforward methods that score tokens/sentences based on the predicted
class probabilities output by a (any) token classification model (trained via
any procedure). In precision-recall evaluations based on real-world label
errors in entity recognition data from CoNLL-2003, we identify a simple and
effective method that consistently detects those sentences containing label
errors when applied with different token classification models.
Related papers
- Imputation using training labels and classification via label imputation [4.387724419358174]
We propose Classification Based on MissForest Imputation to deal with missing data.
CBMI stacks the predicted test label with missing values and stacks the label with the input for imputation.
CBMI consistently shows significantly better results than imputation based on only the input data.
arXiv Detail & Related papers (2023-11-28T15:26:09Z) - Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition.
Taking noisy labels as ground-truth in the loss function results in suboptimal performance.
We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Identifying Label Errors in Object Detection Datasets by Loss Inspection [4.442111891959355]
We introduce a benchmark for label error detection methods on object detection datasets.
We simulate four different types of randomly introduced label errors on train and test sets of well-labeled object detection datasets.
arXiv Detail & Related papers (2023-03-13T10:54:52Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Identifying Incorrect Annotations in Multi-Label Classification Data [14.94741409713251]
We consider algorithms for finding mislabeled examples in multi-label classification datasets.
We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled.
arXiv Detail & Related papers (2022-11-25T05:03:56Z) - Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z) - Multi-class Probabilistic Bounds for Self-learning [13.875239300089861]
Pseudo-labeling is prone to error and runs the risk of adding noisy labels into unlabeled training data.
We present a probabilistic framework for analyzing self-learning in the multi-class classification scenario with partially labeled data.
arXiv Detail & Related papers (2021-09-29T13:57:37Z) - Rethinking Pseudo Labels for Semi-Supervised Object Detection [84.697097472401]
We introduce certainty-aware pseudo labels tailored for object detection.
We dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem.
Our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.
arXiv Detail & Related papers (2021-06-01T01:32:03Z) - Exploiting Context for Robustness to Label Noise in Active Learning [47.341705184013804]
We address the problems of how a system can identify which of the queried labels are wrong and how a multi-class active learning system can be adapted to minimize the negative impact of label noise.
We construct a graphical representation of the unlabeled data to encode these relationships and obtain new beliefs on the graph when noisy labels are available.
This is demonstrated in three different applications: scene classification, activity classification, and document classification.
arXiv Detail & Related papers (2020-10-18T18:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.