Related papers: Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening

Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening

URL: http://arxiv.org/abs/2512.22242v1
Date: Tue, 23 Dec 2025 19:57:21 GMT
Title: Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening
Authors: Shaurya Gaur, Michel Vitale, Alessa Hering, Johan Kwisthout, Colin Jacobs, Lena Philipp, Fennie van der Graaf,
Abstract summary: We evaluate potential performance disparities and fairness in two deep learning risk estimation models for lung cancer screening.<n>Models were trained on data from the US-based National Lung Screening Trial (NLST)<n>We observed a statistically significant AUROC difference in Sybil's performance between women (0.88, 95% CI: 0.86, 0.90) and men (0.81, 95% CI: 0.78, 0.84, p .001).<n>At 90% specificity, Venkadesh21 showed lower sensitivity for Black (0.39, 95% CI: 0.23, 0.59) than White participants (0.69, 95% CI: 0.65,
Score: 0.6974609493696966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lung cancer is the leading cause of cancer-related mortality in adults worldwide. Screening high-risk individuals with annual low-dose CT (LDCT) can support earlier detection and reduce deaths, but widespread implementation may strain the already limited radiology workforce. AI models have shown potential in estimating lung cancer risk from LDCT scans. However, high-risk populations for lung cancer are diverse, and these models' performance across demographic groups remains an open question. In this study, we drew on the considerations on confounding factors and ethically significant biases outlined in the JustEFAB framework to evaluate potential performance disparities and fairness in two deep learning risk estimation models for lung cancer screening: the Sybil lung cancer risk model and the Venkadesh21 nodule risk estimator. We also examined disparities in the PanCan2b logistic regression model recommended in the British Thoracic Society nodule management guideline. Both deep learning models were trained on data from the US-based National Lung Screening Trial (NLST), and assessed on a held-out NLST validation set. We evaluated AUROC, sensitivity, and specificity across demographic subgroups, and explored potential confounding from clinical risk factors. We observed a statistically significant AUROC difference in Sybil's performance between women (0.88, 95% CI: 0.86, 0.90) and men (0.81, 95% CI: 0.78, 0.84, p < .001). At 90% specificity, Venkadesh21 showed lower sensitivity for Black (0.39, 95% CI: 0.23, 0.59) than White participants (0.69, 95% CI: 0.65, 0.73). These differences were not explained by available clinical confounders and thus may be classified as unfair biases according to JustEFAB. Our findings highlight the importance of improving and monitoring model performance across underrepresented subgroups, and further research on algorithmic fairness, in lung cancer screening.

Related papers

Externally Validated Longitudinal GRU Model for Visit-Level 180-Day Mortality Risk in Metastatic Castration-Resistant Prostate Cancer [0.5361389213879222]
Metastatic castration-resistant prostate cancer (mCRPC) is a highly aggressive disease with poor prognosis and heterogeneous treatment response.<n>We developed and validated a visit-level 180-day mortality risk model using longitudinal data from two Phase III cohorts.
arXiv Detail & Related papers (2026-01-27T20:48:53Z)
Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
Subgroup Performance of a Commercial Digital Breast Tomosynthesis Model for Breast Cancer Detection [5.089670339445636]
This study presents a granular evaluation of the Lunit INSIGHT model on a large retrospective cohort of 163,449 screening mammography exams.<n>Performance was found to be robust across demographics, but cases with non-invasive cancers were associated with significantly lower performance.
arXiv Detail & Related papers (2025-03-17T17:17:36Z)
A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study [5.831730826863567]
We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients.<n>Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices.
arXiv Detail & Related papers (2025-03-16T01:53:50Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Pulmonologists-Level lung cancer detection based on standard blood test results and smoking status using an explainable machine learning approach [2.545682175108217]
Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses. In recent years, machine learning has demonstrated considerable potential in healthcare by facilitating the detection of various diseases. We developed an ML model based on dynamic ensemble selection (DES) for LC detection.
arXiv Detail & Related papers (2024-02-14T22:00:57Z)
Prediction of Breast Cancer Recurrence Risk Using a Multi-Model Approach Integrating Whole Slide Imaging and Clinicopathologic Features [0.6679306163028237]
The aim of this study was to develop a multi-model approach integrating the analysis of whole slide images and clinicopathologic data to predict associated breast cancer recurrence risks. The proposed novel methodology uses convolutional neural networks for feature extraction and vision transformers for contextual aggregation.
arXiv Detail & Related papers (2024-01-28T23:33:56Z)
Development and external validation of a lung cancer risk estimation tool using gradient-boosting [3.200615329024819]
Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates. We propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST. The developed ML tool provides a freely available web application for estimating the likelihood of developing lung cancer within five years.
arXiv Detail & Related papers (2023-08-23T15:25:17Z)
Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients [42.09584755334577]
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis to quantify the mortality risks of lung cancer patients. We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model.
arXiv Detail & Related papers (2023-03-09T15:38:16Z)
Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan [40.51754649947294]
The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between January 2011 and April 2018. The model's diagnostic performance was compared with clinicians's performance. The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation.
arXiv Detail & Related papers (2023-02-02T08:45:17Z)
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
Large-Scale Screening of COVID-19 from Community Acquired Pneumonia using Infection Size-Aware Classification [41.85283468679224]
A total of 1658 patients with COVID-19 and 1027 patients of CAP underwent thin-section CT. All images were preprocessed to obtain the segmentations of both infections and lung fields. An infection Size Aware Random Forest method (iSARF) was proposed, in which subjects were automated categorized into groups with different ranges of infected lesion sizes. Experimental results show that the proposed method yielded sensitivity of 0.907, specificity of 0.833, and accuracy of 0.879 under five-fold cross-validation.
arXiv Detail & Related papers (2020-03-22T11:12:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.