Generative models improve fairness of medical classifiers under
distribution shifts
- URL: http://arxiv.org/abs/2304.09218v1
- Date: Tue, 18 Apr 2023 18:15:38 GMT
- Title: Generative models improve fairness of medical classifiers under
distribution shifts
- Authors: Ira Ktena, Olivia Wiles, Isabela Albuquerque, Sylvestre-Alvise
Rebuffi, Ryutaro Tanno, Abhijit Guha Roy, Shekoofeh Azizi, Danielle Belgrave,
Pushmeet Kohli, Alan Karthikesalingam, Taylan Cemgil, Sven Gowal
- Abstract summary: We show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models.
We demonstrate that these learned augmentations can surpass ones by making models more robust and statistically fair in- and out-of-distribution.
- Score: 49.10233060774818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A ubiquitous challenge in machine learning is the problem of domain
generalisation. This can exacerbate bias against groups or labels that are
underrepresented in the datasets used for model development. Model bias can
lead to unintended harms, especially in safety-critical applications like
healthcare. Furthermore, the challenge is compounded by the difficulty of
obtaining labelled data due to high cost or lack of readily available domain
expertise. In our work, we show that learning realistic augmentations
automatically from data is possible in a label-efficient manner using
generative models. In particular, we leverage the higher abundance of
unlabelled data to capture the underlying data distribution of different
conditions and subgroups for an imaging modality. By conditioning generative
models on appropriate labels, we can steer the distribution of synthetic
examples according to specific requirements. We demonstrate that these learned
augmentations can surpass heuristic ones by making models more robust and
statistically fair in- and out-of-distribution. To evaluate the generality of
our approach, we study 3 distinct medical imaging contexts of varying
difficulty: (i) histopathology images from a publicly available generalisation
benchmark, (ii) chest X-rays from publicly available clinical datasets, and
(iii) dermatology images characterised by complex shifts and imaging
conditions. Complementing real training samples with synthetic ones improves
the robustness of models in all three medical tasks and increases fairness by
improving the accuracy of diagnosis within underrepresented groups. This
approach leads to stark improvements OOD across modalities: 7.7% prediction
accuracy improvement in histopathology, 5.2% in chest radiology with 44.6%
lower fairness gap and a striking 63.5% improvement in high-risk sensitivity
for dermatology with a 7.5x reduction in fairness gap.
Related papers
- PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings [0.0]
PRECISe is an explainable-by-design model meticulously constructed to address all three challenges.
PreCISe outperforms the current state-of-the-art methods on data efficient generalization to minority classes.
Case study is presented to highlight the model's ability to produce easily interpretable predictions.
arXiv Detail & Related papers (2024-08-11T12:05:32Z) - The Limits of Fair Medical Imaging AI In The Wild [43.97266228706059]
We investigate the extent to which medical AI utilizes demographic encodings.
We confirm that medical imaging AI leverages demographic shortcuts in disease classification.
We find that models with less encoding of demographic attributes are often most "globally optimal"
arXiv Detail & Related papers (2023-12-11T18:59:50Z) - MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets.
MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes.
We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z) - SSL-CPCD: Self-supervised learning with composite pretext-class
discrimination for improved generalisability in endoscopic image analysis [3.1542695050861544]
Deep learning-based supervised methods are widely popular in medical image analysis.
They require a large amount of training data and face issues in generalisability to unseen datasets.
We propose to explore patch-level instance-group discrimination and penalisation of inter-class variation using additive angular margin.
arXiv Detail & Related papers (2023-05-31T21:28:08Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - SCALP -- Supervised Contrastive Learning for Cardiopulmonary Disease
Classification and Localization in Chest X-rays using Patient Metadata [10.269187107011934]
We introduce an end-to-end framework, SCALP, which extends the self-supervised contrastive approach to a supervised setting.
SCALP pulls together chest X-rays from the same patient (positive keys) and pushes apart chest X-rays from different patients (negative keys)
Our experiments demonstrate that SCALP outperforms existing baselines with significant margins in both classification and localization tasks.
arXiv Detail & Related papers (2021-10-27T21:38:12Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.