Identifying Incorrect Annotations in Multi-Label Classification Data
- URL: http://arxiv.org/abs/2211.13895v1
- Date: Fri, 25 Nov 2022 05:03:56 GMT
- Title: Identifying Incorrect Annotations in Multi-Label Classification Data
- Authors: Aditya Thyagarajan, El\'ias Snorrason, Curtis Northcutt, Jonas Mueller
- Abstract summary: We consider algorithms for finding mislabeled examples in multi-label classification datasets.
We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled.
- Score: 14.94741409713251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-label classification, each example in a dataset may be annotated as
belonging to one or more classes (or none of the classes). Example applications
include image (or document) tagging where each possible tag either applies to a
particular image (or document) or not. With many possible classes to consider,
data annotators are likely to make errors when labeling such data in practice.
Here we consider algorithms for finding mislabeled examples in multi-label
classification datasets. We propose an extension of the Confident Learning
framework to this setting, as well as a label quality score that ranks examples
with label errors much higher than those which are correctly labeled. Both
approaches can utilize any trained classifier. After demonstrating that our
methodology empirically outperforms other algorithms for label error detection,
we apply our approach to discover many label errors in the CelebA image tagging
dataset.
Related papers
- Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - Towards Imbalanced Large Scale Multi-label Classification with Partially
Annotated Labels [8.977819892091]
Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes.
In this work, we address the issue of label imbalance and investigate how to train neural networks using partial labels.
arXiv Detail & Related papers (2023-07-31T21:50:48Z) - Bridging the Gap between Model Explanations in Partially Annotated
Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation.
We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Detecting Label Errors in Token Classification Data [22.539748563923123]
We consider the task of finding sentences that contain label errors in token classification datasets.
We study 11 different straightforward methods that score tokens/sentences based on the predicted class probabilities.
We identify a simple and effective method that consistently detects those sentences containing label errors when applied with different token classification models.
arXiv Detail & Related papers (2022-10-08T05:14:22Z) - Large Loss Matters in Weakly Supervised Multi-Label Classification [50.262533546999045]
We first regard unobserved labels as negative labels, casting the W task into noisy multi-label classification.
We propose novel methods for W which reject or correct the large loss samples to prevent model from memorizing the noisy label.
Our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification.
arXiv Detail & Related papers (2022-06-08T08:30:24Z) - Multi-Label Learning from Single Positive Labels [37.17676289125165]
Predicting all applicable labels for a given image is known as multi-label classification.
We show that it is possible to approach the performance of fully labeled classifiers despite training with significantly fewer confirmed labels.
arXiv Detail & Related papers (2021-06-17T17:58:04Z) - Exploiting Context for Robustness to Label Noise in Active Learning [47.341705184013804]
We address the problems of how a system can identify which of the queried labels are wrong and how a multi-class active learning system can be adapted to minimize the negative impact of label noise.
We construct a graphical representation of the unlabeled data to encode these relationships and obtain new beliefs on the graph when noisy labels are available.
This is demonstrated in three different applications: scene classification, activity classification, and document classification.
arXiv Detail & Related papers (2020-10-18T18:59:44Z) - Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels.
Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.