Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets
- URL: http://arxiv.org/abs/2406.00327v1
- Date: Sat, 1 Jun 2024 07:03:15 GMT
- Title: Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets
- Authors: Yixiong Chen, Zongwei Zhou, Alan Yuille,
- Abstract summary: We introduce a regression model, Quality Sentinel, to estimate label quality compared with manual annotations in medical segmentation datasets.
This regression model was trained on over 4 million image-label pairs created by us.
Our Quality Sentinel can predict the label quality of 142 body structures.
- Score: 11.134987228105162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasing number of public datasets have shown a transformative impact on automated medical segmentation. However, these datasets are often with varying label quality, ranging from manual expert annotations to AI-generated pseudo-annotations. There is no systematic, reliable, and automatic quality control (QC). To fill in this bridge, we introduce a regression model, Quality Sentinel, to estimate label quality compared with manual annotations in medical segmentation datasets. This regression model was trained on over 4 million image-label pairs created by us. Each pair presents a varying but quantified label quality based on manual annotations, which enable us to predict the label quality of any image-label pairs in the inference. Our Quality Sentinel can predict the label quality of 142 body structures. The predicted label quality quantified by Dice Similarity Coefficient (DSC) shares a strong correlation with ground truth quality, with a positive correlation coefficient (r=0.902). Quality Sentinel has found multiple impactful use cases. (I) We evaluated label quality in publicly available datasets, where quality highly varies across different datasets. Our analysis also uncovers that male and younger subjects exhibit significantly higher quality. (II) We identified and corrected poorly annotated labels, achieving 1/3 reduction in annotation costs with optimal budgeting on TotalSegmentator. (III) We enhanced AI training efficiency and performance by focusing on high-quality pseudo labels, resulting in a 33%--88% performance boost over entropy-based methods, with a cost of 31% time and 4.5% memory. The data and model are released.
Related papers
- Balancing Label Quantity and Quality for Scalable Elicitation [2.2143065226946423]
We study the microeconomics of the quantity-quality tradeoff on binary NLP classification tasks.
We observe three regimes of eliciting classification knowledge from pretrained models using supervised finetuning.
We find that the accuracy of supervised fine-tuning can be improved by up to 5 percentage points at a fixed labeling budget.
arXiv Detail & Related papers (2024-10-17T04:39:58Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins [73.17295479535161]
MarginMatch is a new SSL approach combining consistency regularization and pseudo-labeling.
We analyze the behavior of the model on the pseudo-labeled examples as the training progresses to ensure low quality predictions are masked out.
We obtain an improvement in error rate over the state-of-the-art of 3.25% on CIFAR-100 with only 25 labels per class and of 3.78% on STL-10 using as few as 4 labels per class.
arXiv Detail & Related papers (2023-08-17T15:19:04Z) - Estimating label quality and errors in semantic segmentation data via
any model [19.84626033109009]
We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
arXiv Detail & Related papers (2023-07-11T07:29:09Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - An Empirical Investigation of Learning from Biased Toxicity Labels [15.822714574671412]
We study how different training strategies can leverage a small dataset of human-annotated labels and a large but noisy dataset of synthetically generated labels.
We evaluate the accuracy and fairness properties of these approaches, and trade-offs between the two.
arXiv Detail & Related papers (2021-10-04T17:19:57Z) - Statistical Learning to Operationalize a Domain Agnostic Data Quality
Scoring [8.864453148536061]
The research study provides an automated platform which takes an incoming dataset and metadata to provide the DQ score, report and label.
The results of this study would be useful to data scientists as the value of this quality label would instill confidence before deploying the data for his/her respective practical application.
arXiv Detail & Related papers (2021-08-16T12:20:57Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Improving Medical Annotation Quality to Decrease Labeling Burden Using
Stratified Noisy Cross-Validation [3.690031561736533]
Variability in diagnosis of medical images is well established; variability in training and attention to task among medical labelers may exacerbate this issue.
Noisy Cross-Validation splits the training data into halves, and has been shown to identify low-quality labels in computer vision tasks.
In this work we introduce Stratified Noisy Cross-Validation (SNCV), an extension of noisy cross validation.
arXiv Detail & Related papers (2020-09-22T23:32:59Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.