Related papers: Towards Reliable Dermatology Evaluation Benchmarks

Towards Reliable Dermatology Evaluation Benchmarks

URL: http://arxiv.org/abs/2309.06961v2
Date: Sat, 16 Dec 2023 06:14:00 GMT
Title: Towards Reliable Dermatology Evaluation Benchmarks
Authors: Fabian Gr\"oger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Matthew Groh, Roxana Daneshjou, Labelling Consortium, Alexander A. Navarini, Marc Pouly
Abstract summary: Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation.
Score: 37.464923424849964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.

Related papers

GRASP-PsONet: Gradient-based Removal of Spurious Patterns for PsOriasis Severity Classification [0.0]
We propose a framework to automatically flag problematic training images that introduce spurious correlations.<n>Removing 8.2% of flagged images improves model AUC-ROC by 5% (85% to 90%) on a held out test set.<n>When applied to a subset of training data rated by two dermatologists, the method identifies over 90% of cases with inter-rater disagreement.
arXiv Detail & Related papers (2025-06-27T03:42:09Z)
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval [2.9801426627439453]
This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data. By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.
arXiv Detail & Related papers (2025-01-15T20:37:04Z)
An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input. Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard. This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z)
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers. LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs. We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z)
Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data. The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z)
Estimating label quality and errors in semantic segmentation data via any model [19.84626033109009]
We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
arXiv Detail & Related papers (2023-07-11T07:29:09Z)
Intrinsic Self-Supervision for Data Quality Audits [35.69673085324971]
Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors. In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, or a scoring problem. We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
arXiv Detail & Related papers (2023-05-26T15:57:04Z)
Self-Supervised Learning as a Means To Reduce the Need for Labeled Data in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies. We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z)
Cascaded Robust Learning at Imperfect Labels for Chest X-ray Segmentation [61.09321488002978]
We present a novel cascaded robust learning framework for chest X-ray segmentation with imperfect annotation. Our model consists of three independent network, which can effectively learn useful information from the peer networks. Our methods could achieve a significant improvement on the accuracy in segmentation tasks compared to the previous methods.
arXiv Detail & Related papers (2021-04-05T15:50:16Z)
Cancer image classification based on DenseNet model [3.3516258832067067]
We propose a novel metastatic cancer image classification model based on DenseNet Block. We evaluate the proposed approach to the slightly modified version of the PatchCamelyon (PCam) benchmark dataset.
arXiv Detail & Related papers (2020-11-23T03:05:42Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning applied to Gastrointestinal Tract Abnormality Classification [2.985964157078619]
Automatic analysis of diseases in the GI tract is a hot topic in computer science and medical-related journals. A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level. We present comprehensive evaluations of five distinct machine learning models that can classify 16 different GI tract conditions.
arXiv Detail & Related papers (2020-05-08T08:59:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.