Predict Patient Self-reported Race from Skin Histological Images
- URL: http://arxiv.org/abs/2507.21912v2
- Date: Wed, 30 Jul 2025 02:14:39 GMT
- Title: Predict Patient Self-reported Race from Skin Histological Images
- Authors: Shengjia Chen, Ruchika Verma, Kevin Clare, Jannes Jegminat, Eugenia Alleva, Kuan-lin Huang, Brandon Veremis, Thomas Fuchs, Gabriele Campanella,
- Abstract summary: Deep learning models can predict self-reported race from digitized dermatopathology slides.<n>White and Black demographic groups retained high prediction performance.<n> Attention analysis revealed the epidermis as a key predictive feature.
- Score: 1.5356047676189333
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Artificial Intelligence (AI) has demonstrated success in computational pathology (CPath) for disease detection, biomarker classification, and prognosis prediction. However, its potential to learn unintended demographic biases, particularly those related to social determinants of health, remains understudied. This study investigates whether deep learning models can predict self-reported race from digitized dermatopathology slides and identifies potential morphological shortcuts. Using a multisite dataset with a racially diverse population, we apply an attention-based mechanism to uncover race-associated morphological features. After evaluating three dataset curation strategies to control for confounding factors, the final experiment showed that White and Black demographic groups retained high prediction performance (AUC: 0.799, 0.762), while overall performance dropped to 0.663. Attention analysis revealed the epidermis as a key predictive feature, with significant performance declines when these regions were removed. These findings highlight the need for careful data curation and bias mitigation to ensure equitable AI deployment in pathology. Code available at: https://github.com/sinai-computational-pathology/CPath_SAIF.
Related papers
- Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
This study examines how bias evolves across tasks using domain-specific fairness metrics and how different CL strategies impact this evolution.<n>Our results show that Learning without Forgetting and Pseudo-Label achieve optimal classification performance, but Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Deep Learning Discovery of Demographic Biomarkers in Echocardiography [0.3957768262206625]
We test whether it is possible to predict age, race, and sex from cardiac ultrasound images using deep learning algorithms.
We trained video-based convolutional neural networks to predict age, sex, and race.
We found that deep learning models were able to identify age and sex, while unable to reliably predict race.
arXiv Detail & Related papers (2022-07-13T16:48:49Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Algorithmic encoding of protected characteristics and its implications
on disparities across subgroups [17.415882865534638]
Machine learning models may pick up undesirable correlations between a patient's racial identity and clinical outcome.
Very little is known about how these biases are encoded and how one may reduce or even remove disparate performance.
arXiv Detail & Related papers (2021-10-27T20:30:57Z) - Reading Race: AI Recognises Patient's Racial Identity In Medical Images [9.287449389763413]
There is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images.
Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities.
arXiv Detail & Related papers (2021-07-21T21:10:16Z) - Patient-independent Epileptic Seizure Prediction using Deep Learning
Models [39.19336481493405]
The purpose of a seizure prediction system is to successfully identify the pre-ictal brain stage, which occurs before a seizure event.
Patient-independent seizure prediction models are designed to offer accurate performance across multiple subjects within a dataset.
We propose two patient-independent deep learning architectures with different learning strategies that can learn a global function utilizing data from multiple subjects.
arXiv Detail & Related papers (2020-11-18T23:13:48Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Risk of Training Diagnostic Algorithms on Data with Demographic Bias [0.5599792629509227]
We conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications.
Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used.
We show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup.
arXiv Detail & Related papers (2020-05-20T13:51:01Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.