Development and external validation of a lung cancer risk estimation
tool using gradient-boosting
- URL: http://arxiv.org/abs/2308.12188v1
- Date: Wed, 23 Aug 2023 15:25:17 GMT
- Title: Development and external validation of a lung cancer risk estimation
tool using gradient-boosting
- Authors: Pierre-Louis Benveniste, Julie Alberge, Lei Xing, Jean-Emmanuel
Bibault
- Abstract summary: Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates.
We propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST.
The developed ML tool provides a freely available web application for estimating the likelihood of developing lung cancer within five years.
- Score: 3.200615329024819
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Lung cancer is a significant cause of mortality worldwide, emphasizing the
importance of early detection for improved survival rates. In this study, we
propose a machine learning (ML) tool trained on data from the PLCO Cancer
Screening Trial and validated on the NLST to estimate the likelihood of lung
cancer occurrence within five years. The study utilized two datasets, the PLCO
(n=55,161) and NLST (n=48,595), consisting of comprehensive information on risk
factors, clinical measurements, and outcomes related to lung cancer. Data
preprocessing involved removing patients who were not current or former smokers
and those who had died of causes unrelated to lung cancer. Additionally, a
focus was placed on mitigating bias caused by censored data. Feature selection,
hyper-parameter optimization, and model calibration were performed using
XGBoost, an ensemble learning algorithm that combines gradient boosting and
decision trees. The ML model was trained on the pre-processed PLCO dataset and
tested on the NLST dataset. The model incorporated features such as age,
gender, smoking history, medical diagnoses, and family history of lung cancer.
The model was well-calibrated (Brier score=0.044). ROC-AUC was 82% on the PLCO
dataset and 70% on the NLST dataset. PR-AUC was 29% and 11% respectively. When
compared to the USPSTF guidelines for lung cancer screening, our model provided
the same recall with a precision of 13.1% vs. 9.3% on the PLCO dataset and 3.2%
vs. 3.1% on the NLST dataset. The developed ML tool provides a freely available
web application for estimating the likelihood of developing lung cancer within
five years. By utilizing risk factors and clinical data, individuals can assess
their risk and make informed decisions regarding lung cancer screening. This
research contributes to the efforts in early detection and prevention
strategies, aiming to reduce lung cancer-related mortality rates.
Related papers
- Improving Lung Cancer Diagnosis and Survival Prediction with Deep Learning and CT Imaging [12.276877277186284]
Lung cancer is a major cancer-related deaths, and early diagnosis and treatment are crucial for improving patients' survival outcomes.
We propose to employ neural convolutional networks of networks obtained between the risk of lung cancer and the lungs in CT experiments.
Results demonstrate the effectiveness of both the mini-batched loss and binary cross-entropy to predict both lung cancer and the risk of the occurrence.
arXiv Detail & Related papers (2024-08-18T05:45:08Z) - Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images [45.29301790646322]
Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization.
We propose CADe, for segmenting lung nodules in a zero-shot manner using a variant of the Segment Anything Model called MedSAM.
We also propose, CADx, a method for the nodule characterization as benign/malignant by making a gallery of radiomic features and aligning image-feature pairs through contrastive learning.
arXiv Detail & Related papers (2024-07-02T19:30:25Z) - Pulmonologists-Level lung cancer detection based on standard blood test
results and smoking status using an explainable machine learning approach [2.545682175108217]
Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses.
In recent years, machine learning has demonstrated considerable potential in healthcare by facilitating the detection of various diseases.
We developed an ML model based on dynamic ensemble selection (DES) for LC detection.
arXiv Detail & Related papers (2024-02-14T22:00:57Z) - Survival Prediction from Imbalance colorectal cancer dataset using
hybrid sampling methods and tree-based classifiers [0.0]
This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients.
We propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate.
Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.
arXiv Detail & Related papers (2023-09-04T19:48:56Z) - Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep
Radiomic Features from Synthetic Correlated Diffusion Imaging [82.74877848011798]
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023.
The gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy.
In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$s$) imaging.
arXiv Detail & Related papers (2023-04-12T15:08:34Z) - Artificial intelligence based prediction on lung cancer risk factors
using deep learning [0.0]
Capturing and defining symptoms at an early stage is one of the most difficult phases for patients.
We developed a model that can detect lung cancer with a remarkably high level of accuracy using the deep learning approach.
We found that our model achieved an accuracy of 94% and a minimum loss of 0.1%.
arXiv Detail & Related papers (2023-04-11T08:57:15Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes [0.6810862244331126]
Lung cancer is the most common cause of cancer death in the United States.
Lung cancer CT screening has been shown to reduce mortality by up to 40% and is now included in US screening guidelines.
Despite the use of standards for radiological diagnosis, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations of current methods.
In this report, we reproduce a state-of-the-art deep learning algorithm for lung cancer risk prediction.
arXiv Detail & Related papers (2020-07-25T10:01:22Z) - Integrative Analysis for COVID-19 Patient Outcome Prediction [53.11258640541513]
We combine radiomics of lung opacities and non-imaging features from demographic data, vital signs, and laboratory findings to predict need for intensive care unit admission.
Our methods may also be applied to other lung diseases including but not limited to community acquired pneumonia.
arXiv Detail & Related papers (2020-07-20T19:08:50Z) - Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning [57.00601760750389]
We present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images.
Such a tool can gauge severity of COVID-19 lung infections that can be used for escalation or de-escalation of care.
arXiv Detail & Related papers (2020-05-24T23:13:16Z) - Automated Quantification of CT Patterns Associated with COVID-19 from
Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions.
The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities.
Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.