Prevention is better than cure: a case study of the abnormalities
detection in the chest
- URL: http://arxiv.org/abs/2305.10961v1
- Date: Thu, 18 May 2023 13:28:00 GMT
- Title: Prevention is better than cure: a case study of the abnormalities
detection in the chest
- Authors: Weronika Hryniewska, Piotr Czarnecki, Jakub Wi\'sniewski,
Przemys{\l}aw Bombi\'nski, Przemys{\l}aw Biecek
- Abstract summary: We show how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process.
Errors made at the data collection stage make it difficult to validate the model correctly.
We show how to monitor data and model balance (fairness) throughout the life cycle of a predictive model.
- Score: 4.000351859705655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prevention is better than cure. This old truth applies not only to the
prevention of diseases but also to the prevention of issues with AI models used
in medicine. The source of malfunctioning of predictive models often lies not
in the training process but reaches the data acquisition phase or design of the
experiment phase.
In this paper, we analyze in detail a single use case - a Kaggle competition
related to the detection of abnormalities in X-ray lung images. We demonstrate
how a series of simple tests for data imbalance exposes faults in the data
acquisition and annotation process. Complex models are able to learn such
artifacts and it is difficult to remove this bias during or after the training.
Errors made at the data collection stage make it difficult to validate the
model correctly.
Based on this use case, we show how to monitor data and model balance
(fairness) throughout the life cycle of a predictive model, from data
acquisition to parity analysis of model scores.
Related papers
- One Shot GANs for Long Tail Problem in Skin Lesion Dataset using novel content space assessment metric [1.833650794546064]
Long tail problems frequently arise in the medical field due to the scarcity of medical data for rare conditions.
One Shot GANs was employed to augment the tail class of HAM10000 dataset by generating additional samples.
arXiv Detail & Related papers (2024-09-30T04:51:54Z) - Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [49.73114504515852]
We show that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse.
We demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse.
arXiv Detail & Related papers (2024-04-01T18:31:24Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Test-Time Selection for Robust Skin Lesion Analysis [20.792979998188848]
Skin lesion analysis models are biased by artifacts placed during image acquisition.
We propose TTS (Test-Time Selection), a human-in-the-loop method that leverages positive (e.g., lesion area) and negative (e.g., artifacts) keypoints in test samples.
Our solution is robust to a varying availability of annotations, and different levels of bias.
arXiv Detail & Related papers (2023-08-10T14:08:50Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Medical data wrangling with sequential variational autoencoders [5.9207487081080705]
This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs)
We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model.
arXiv Detail & Related papers (2021-03-12T10:59:26Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Debugging Tests for Model Explanations [18.073554618753395]
Methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples.
We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.
arXiv Detail & Related papers (2020-11-10T22:23:25Z) - Bayesian Sampling Bias Correction: Training with the Right Loss Function [0.0]
We derive a family of loss functions to train models in the presence of sampling bias.
Examples are when the prevalence of a pathology differs from its sampling rate in the training dataset, or when a machine learning practioner rebalances their training dataset.
arXiv Detail & Related papers (2020-06-24T15:10:43Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.