Test-Time Selection for Robust Skin Lesion Analysis
- URL: http://arxiv.org/abs/2308.05595v1
- Date: Thu, 10 Aug 2023 14:08:50 GMT
- Title: Test-Time Selection for Robust Skin Lesion Analysis
- Authors: Alceu Bissoto, Catarina Barata, Eduardo Valle, Sandra Avila
- Abstract summary: Skin lesion analysis models are biased by artifacts placed during image acquisition.
We propose TTS (Test-Time Selection), a human-in-the-loop method that leverages positive (e.g., lesion area) and negative (e.g., artifacts) keypoints in test samples.
Our solution is robust to a varying availability of annotations, and different levels of bias.
- Score: 20.792979998188848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skin lesion analysis models are biased by artifacts placed during image
acquisition, which influence model predictions despite carrying no clinical
information. Solutions that address this problem by regularizing models to
prevent learning those spurious features achieve only partial success, and
existing test-time debiasing techniques are inappropriate for skin lesion
analysis due to either making unrealistic assumptions on the distribution of
test data or requiring laborious annotation from medical practitioners. We
propose TTS (Test-Time Selection), a human-in-the-loop method that leverages
positive (e.g., lesion area) and negative (e.g., artifacts) keypoints in test
samples. TTS effectively steers models away from exploiting spurious
artifact-related correlations without retraining, and with less annotation
requirements. Our solution is robust to a varying availability of annotations,
and different levels of bias. We showcase on the ISIC2019 dataset (for which we
release a subset of annotated images) how our model could be deployed in the
real-world for mitigating bias.
Related papers
- A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective [33.78421391776591]
In this paper, we propose a novel perspective of mislabeled sample detection.
We show that our new perspective can boost the precision of detection and rectify biased models effectively.
Our approach is complementary to existing methods, showing performance improvement even when applied to models that have already undergone recent debiasing techniques.
arXiv Detail & Related papers (2024-11-01T04:54:32Z) - Model-based causal feature selection for general response types [8.228587135343071]
Invariant causal prediction (ICP) is a method for causal feature selection which requires data from heterogeneous settings.
We develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses.
We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.
arXiv Detail & Related papers (2023-09-22T12:42:48Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Prevention is better than cure: a case study of the abnormalities
detection in the chest [4.000351859705655]
We show how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process.
Errors made at the data collection stage make it difficult to validate the model correctly.
We show how to monitor data and model balance (fairness) throughout the life cycle of a predictive model.
arXiv Detail & Related papers (2023-05-18T13:28:00Z) - Zero-shot Model Diagnosis [80.36063332820568]
A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs.
This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
arXiv Detail & Related papers (2023-03-27T17:59:33Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Artifact-Based Domain Generalization of Skin Lesion Models [20.792979998188848]
We propose a pipeline that relies on artifacts annotation to enable generalization evaluation and debiasing.
We create environments based on skin lesion artifacts to enable domain generalization methods.
Our results raise a concern that debiasing models towards a single aspect may not be enough for fair skin lesion analysis.
arXiv Detail & Related papers (2022-08-20T22:25:09Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Debiasing Skin Lesion Datasets and Models? Not So Fast [17.668005682385175]
Models learned from data risk learning biases from that same data.
When models learn spurious correlations not found in real-world situations, their deployment for critical tasks, such as medical decisions, can be catastrophic.
We find out that, despite interesting results that point to promising future research, current debiasing methods are not ready to solve the bias issue for skin-lesion models.
arXiv Detail & Related papers (2020-04-23T21:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.