Good Enough: Is it Worth Improving your Label Quality?
- URL: http://arxiv.org/abs/2505.20928v1
- Date: Tue, 27 May 2025 09:18:24 GMT
- Title: Good Enough: Is it Worth Improving your Label Quality?
- Authors: Alexander Jaus, Zdravko Marinov, Constantin Seibold, Simon Reiß, Jens Kleesiek, Rainer Stiefelhagen,
- Abstract summary: Higher-quality labels improve in-domain performance, but gains remain unclear if below a small threshold.<n>For pre-training, label quality has minimal impact, suggesting that models rather transfer general concepts than detailed annotations.
- Score: 66.74591380455261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving label quality in medical image segmentation is costly, but its benefits remain unclear. We systematically evaluate its impact using multiple pseudo-labeled versions of CT datasets, generated by models like nnU-Net, TotalSegmentator, and MedSAM. Our results show that while higher-quality labels improve in-domain performance, gains remain unclear if below a small threshold. For pre-training, label quality has minimal impact, suggesting that models rather transfer general concepts than detailed annotations. These findings provide guidance on when improving label quality is worth the effort.
Related papers
- When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification [11.49089004019603]
We present a comprehensive framework named REVEAL to address both noisy labels and missing labels in image classification test sets.<n> REVEAL detects potential noisy labels and omissions, aggregates predictions from various methods, and refines label accuracy through confidence-informed predictions and consensus-based filtering.<n>Our method effectively reveals missing labels from public datasets and provides soft-labeled results with likelihoods.
arXiv Detail & Related papers (2025-05-22T02:47:36Z) - You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets [11.134987228105162]
We introduce a regression model, Quality Sentinel, to estimate label quality compared with manual annotations in medical segmentation datasets.
This regression model was trained on over 4 million image-label pairs created by us.
Our Quality Sentinel can predict the label quality of 142 body structures.
arXiv Detail & Related papers (2024-06-01T07:03:15Z) - Label Smarter, Not Harder: CleverLabel for Faster Annotation of
Ambiguous Image Classification with Higher Quality [0.6927055673104933]
We use proposal-guided annotations as one option which leads to more consistency between annotators.
We propose a new method CleverLabel for Cost-effective LabEling using validated proposal-guidEd annotations and Repaired LABELs.
CleverLabel can reduce labeling costs by up to 30.0%, while achieving a relative improvement in Kullback-Leibler divergence of up to 29.8%.
arXiv Detail & Related papers (2023-05-22T08:12:25Z) - Bridging the Gap between Model Explanations in Partially Annotated
Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation.
We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Active label cleaning: Improving dataset quality under resource
constraints [13.716577886649018]
Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models.
This work advocates for a data-driven approach to prioritising samples for re-annotation.
We rank instances according to estimated label correctness and labelling difficulty of each sample, and introduce a simulation framework to evaluate relabelling efficacy.
arXiv Detail & Related papers (2021-09-01T19:03:57Z) - Learning Image Labels On-the-fly for Training Robust Classification
Models [13.669654965671604]
We show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks.
A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes.
arXiv Detail & Related papers (2020-09-22T05:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.