Label Smarter, Not Harder: CleverLabel for Faster Annotation of
Ambiguous Image Classification with Higher Quality
- URL: http://arxiv.org/abs/2305.12811v1
- Date: Mon, 22 May 2023 08:12:25 GMT
- Title: Label Smarter, Not Harder: CleverLabel for Faster Annotation of
Ambiguous Image Classification with Higher Quality
- Authors: Lars Schmarje, Vasco Grossmann, Tim Michels, Jakob Nazarenus, Monty
Santarossa, Claudius Zelenka, Reinhard Koch
- Abstract summary: We use proposal-guided annotations as one option which leads to more consistency between annotators.
We propose a new method CleverLabel for Cost-effective LabEling using validated proposal-guidEd annotations and Repaired LABELs.
CleverLabel can reduce labeling costs by up to 30.0%, while achieving a relative improvement in Kullback-Leibler divergence of up to 29.8%.
- Score: 0.6927055673104933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality data is crucial for the success of machine learning, but
labeling large datasets is often a time-consuming and costly process. While
semi-supervised learning can help mitigate the need for labeled data, label
quality remains an open issue due to ambiguity and disagreement among
annotators. Thus, we use proposal-guided annotations as one option which leads
to more consistency between annotators. However, proposing a label increases
the probability of the annotators deciding in favor of this specific label.
This introduces a bias which we can simulate and remove. We propose a new
method CleverLabel for Cost-effective LabEling using Validated proposal-guidEd
annotations and Repaired LABELs. CleverLabel can reduce labeling costs by up to
30.0%, while achieving a relative improvement in Kullback-Leibler divergence of
up to 29.8% compared to the previous state-of-the-art on a multi-domain
real-world image classification benchmark. CleverLabel offers a novel solution
to the challenge of efficiently labeling large datasets while also improving
the label quality.
Related papers
- Good Enough: Is it Worth Improving your Label Quality? [66.74591380455261]
Higher-quality labels improve in-domain performance, but gains remain unclear if below a small threshold.<n>For pre-training, label quality has minimal impact, suggesting that models rather transfer general concepts than detailed annotations.
arXiv Detail & Related papers (2025-05-27T09:18:24Z) - You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Don't Waste a Single Annotation: Improving Single-Label Classifiers
Through Soft Labels [7.396461226948109]
We address the limitations of the common data annotation and training methods for objective single-label classification tasks.
Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels.
arXiv Detail & Related papers (2023-11-09T10:47:39Z) - Towards Imbalanced Large Scale Multi-label Classification with Partially
Annotated Labels [8.977819892091]
Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes.
In this work, we address the issue of label imbalance and investigate how to train neural networks using partial labels.
arXiv Detail & Related papers (2023-07-31T21:50:48Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations.
We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z) - ScarceNet: Animal Pose Estimation with Scarce Annotations [74.48263583706712]
ScarceNet is a pseudo label-based approach to generate artificial labels for the unlabeled images.
We evaluate our approach on the challenging AP-10K dataset, where our approach outperforms existing semi-supervised approaches by a large margin.
arXiv Detail & Related papers (2023-03-27T09:15:53Z) - Learning from Stochastic Labels [8.178975818137937]
Annotating multi-class instances is a crucial task in the field of machine learning.
In this paper, we propose a novel suitable approach to learn from these labels.
arXiv Detail & Related papers (2023-02-01T08:04:27Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.