Intrinsic Self-Supervision for Data Quality Audits
- URL: http://arxiv.org/abs/2305.17048v3
- Date: Mon, 28 Oct 2024 18:59:22 GMT
- Title: Intrinsic Self-Supervision for Data Quality Audits
- Authors: Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Labelling Consortium, Matthew Groh, Alexander A. Navarini, Marc Pouly,
- Abstract summary: Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors.
In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, or a scoring problem.
We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
- Score: 35.69673085324971
- License:
- Abstract: Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors, leading to inaccurate estimates of model performance. In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, which significantly reduces human inspection effort, or a scoring problem, which allows for automated decisions based on score distributions. We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases. This methodology, which we call SelfClean, surpasses state-of-the-art performance in detecting off-topic images, near duplicates, and label errors within widely-used image datasets, such as ImageNet-1k, Food-101N, and STL-10, both for synthetic issues and real contamination. We apply the detailed method to multiple image benchmarks, identify up to 16% of issues, and confirm an improvement in evaluation reliability upon cleaning. The official implementation can be found at: https://github.com/Digital-Dermatology/SelfClean.
Related papers
- Automated Classification of Model Errors on ImageNet [7.455546102930913]
We propose an automated error classification framework to study how modeling choices affect error distributions.
We use our framework to comprehensively evaluate the error distribution of over 900 models.
In particular, we observe that the portion of severe errors drops significantly with top-1 accuracy indicating that, while it underreports a model's true performance, it remains a valuable performance metric.
arXiv Detail & Related papers (2023-11-13T20:41:39Z) - Estimating label quality and errors in semantic segmentation data via
any model [19.84626033109009]
We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
arXiv Detail & Related papers (2023-07-11T07:29:09Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Benchmarking the Robustness of Deep Neural Networks to Common
Corruptions in Digital Pathology [11.398235052118608]
This benchmark is established to evaluate how deep neural networks perform on corrupted pathology images.
Two classification and one ranking metrics are designed to evaluate the prediction and confidence performance under corruption.
arXiv Detail & Related papers (2022-06-30T01:53:46Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Revisiting Consistency Regularization for Semi-supervised Change
Detection in Remote Sensing Images [60.89777029184023]
We propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss.
Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD.
arXiv Detail & Related papers (2022-04-18T17:59:01Z) - Understanding out-of-distribution accuracies through quantifying
difficulty of test samples [10.266928164137635]
Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets.
We propose a new metric to quantify the difficulty of the test images (either ID or OOD) that depends on the interaction of the training dataset and the model.
arXiv Detail & Related papers (2022-03-28T21:13:41Z) - A new baseline for retinal vessel segmentation: Numerical identification
and correction of methodological inconsistencies affecting 100+ papers [0.0]
We performed a detailed numerical analysis of the coherence of the published performance scores.
We found inconsistencies in the reported scores related to the use of the field of view.
The highest accuracy score achieved to date is 0.9582 in the FoV region, which is 1% higher than that of human annotators.
arXiv Detail & Related papers (2021-11-06T11:09:11Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.