Estimating label quality and errors in semantic segmentation data via
any model
- URL: http://arxiv.org/abs/2307.05080v1
- Date: Tue, 11 Jul 2023 07:29:09 GMT
- Title: Estimating label quality and errors in semantic segmentation data via
any model
- Authors: Vedang Lad, Jonas Mueller
- Abstract summary: We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
- Score: 19.84626033109009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The labor-intensive annotation process of semantic segmentation datasets is
often prone to errors, since humans struggle to label every pixel correctly. We
study algorithms to automatically detect such annotation errors, in particular
methods to score label quality, such that the images with the lowest scores are
least likely to be correctly labeled. This helps prioritize what data to review
in order to ensure a high-quality training/evaluation dataset, which is
critical in sensitive applications such as medical imaging and autonomous
vehicles. Widely applicable, our label quality scores rely on probabilistic
predictions from a trained segmentation model -- any model architecture and
training procedure can be utilized. Here we study 7 different label quality
scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation
model to detect annotation errors in a version of the SYNTHIA dataset.
Precision-recall evaluations reveal a score -- the soft-minimum of the
model-estimated likelihoods of each pixel's annotated class -- that is
particularly effective to identify images that are mislabeled, across multiple
types of annotation error.
Related papers
- Improving Label Error Detection and Elimination with Uncertainty Quantification [5.184615738004059]
We develop novel, model-agnostic algorithms for Uncertainty Quantification-Based Label Error Detection (UQ-LED)
Our UQ-LED algorithms outperform state-of-the-art confident learning in identifying label errors.
We propose a novel approach to generate realistic, class-dependent label errors synthetically.
arXiv Detail & Related papers (2024-05-15T15:17:52Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - AQuA: A Benchmarking Tool for Label Quality Assessment [16.83510474053401]
Recent studies have found datasets widely used to train and evaluate machine learning models to have pervasive labeling errors.
We propose a benchmarking environment AQuA to rigorously evaluate methods that enable machine learning in the presence of label noise.
arXiv Detail & Related papers (2023-06-15T19:42:11Z) - Learning Confident Classifiers in the Presence of Label Noise [5.829762367794509]
This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models.
Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
arXiv Detail & Related papers (2023-01-02T04:27:25Z) - Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification [5.279257531335345]
We for the first time present a method for detecting label errors in semantic segmentation datasets with pixel-wise labels.
Our approach is able to detect the vast majority of label errors while controlling the number of false label error detections.
arXiv Detail & Related papers (2022-07-13T10:25:23Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Learning from Partially Overlapping Labels: Image Segmentation under
Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation.
We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Deep Learning for Earth Image Segmentation based on Imperfect Polyline
Labels with Annotation Errors [12.547819302858045]
This paper proposes a generic learning framework based on the EM algorithm to update deep learning model parameters and infer hidden true label locations simultaneously.
Evaluations on a real-world hydrological dataset in the streamline refinement application show that the proposed framework outperforms baseline methods in classification accuracy.
arXiv Detail & Related papers (2020-10-02T02:54:06Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.