Demographic Predictability in 3D CT Foundation Embeddings
- URL: http://arxiv.org/abs/2412.00110v1
- Date: Thu, 28 Nov 2024 04:26:39 GMT
- Title: Demographic Predictability in 3D CT Foundation Embeddings
- Authors: Guangyao Zheng, Michael A. Jacobs, Vishwa S. Parekh,
- Abstract summary: Self-supervised foundation models have been successfully extended to encode 3D computed tomography (CT) images.
We evaluate whether these embeddings capture demographic information, such as age, sex, or race.
- Score: 0.0
- License:
- Abstract: Self-supervised foundation models have recently been successfully extended to encode three-dimensional (3D) computed tomography (CT) images, with excellent performance across several downstream tasks, such as intracranial hemorrhage detection and lung cancer risk forecasting. However, as self-supervised models learn from complex data distributions, questions arise concerning whether these embeddings capture demographic information, such as age, sex, or race. Using the National Lung Screening Trial (NLST) dataset, which contains 3D CT images and demographic data, we evaluated a range of classifiers: softmax regression, linear regression, linear support vector machine, random forest, and decision tree, to predict sex, race, and age of the patients in the images. Our results indicate that the embeddings effectively encoded age and sex information, with a linear regression model achieving a root mean square error (RMSE) of 3.8 years for age prediction and a softmax regression model attaining an AUC of 0.998 for sex classification. Race prediction was less effective, with an AUC of 0.878. These findings suggest a detailed exploration into the information encoded in self-supervised learning frameworks is needed to help ensure fair, responsible, and patient privacy-protected healthcare AI.
Related papers
- Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings [13.985136866888379]
Self-supervised learning has revolutionized medical imaging by enabling efficient and generalizable feature extraction from large-scale unlabeled datasets.
Recently, self-supervised foundation models have been extended to three-dimensional (3D) computed tomography (CT) data, generating compact, information-rich embeddings with 1408 features.
These embeddings have been shown to encode demographic information, such as age, sex, and race, which poses a significant risk to the fairness of clinical applications.
We propose a Variation Autoencoder (VAE) based adversarial debiasing framework to transform these embeddings into a new latent space where demographic
arXiv Detail & Related papers (2025-02-05T20:32:42Z) - Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Deep Learning for Predicting Progression of Patellofemoral
Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data and
Symptomatic Assessments [1.1549572298362785]
This study included subjects (1832 subjects, 3276 knees) from the baseline of the MOST study.
PF joint regions-of-interest were identified using an automated landmark detection tool (BoneFinder) on lateral knee X-rays.
Risk factors included age, sex, BMI and WOMAC score, and the radiographic osteoarthritis stage of the tibiofemoral joint (KL score)
arXiv Detail & Related papers (2023-05-10T06:43:33Z) - Feature robustness and sex differences in medical imaging: a case study
in MRI-based Alzheimer's disease detection [1.7616042687330637]
We compare two classification schemes on the ADNI MRI dataset.
We do not find a strong dependence of model performance for male and female test subjects on the sex composition of the training dataset.
arXiv Detail & Related papers (2022-04-04T17:37:54Z) - A Deep Learning Technique using a Sequence of Follow Up X-Rays for
Disease classification [3.3345134768053635]
The ability to predict lung and heart based diseases using deep learning techniques is central to many researchers.
We present a hypothesis that X-rays of patients included with the follow up history of their most recent three chest X-ray images would perform better in disease classification.
arXiv Detail & Related papers (2022-03-28T19:58:47Z) - An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction
Tool using Random Forest Model [1.1024591739346292]
We propose predictive models that estimate GBM patients' health status of one-year after treatments.
We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates.
Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age.
arXiv Detail & Related papers (2021-08-30T07:56:34Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Automated Model Design and Benchmarking of 3D Deep Learning Models for
COVID-19 Detection with Chest CT Scans [72.04652116817238]
We propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification.
We also exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results.
arXiv Detail & Related papers (2021-01-14T03:45:01Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.