Machine Learning Models for Predicting Smoking-Related Health Decline and Disease Risk
- URL: http://arxiv.org/abs/2511.14682v1
- Date: Tue, 18 Nov 2025 17:21:32 GMT
- Title: Machine Learning Models for Predicting Smoking-Related Health Decline and Disease Risk
- Authors: Vaskar Chakma, MD Jaheid Hasan Nerab, Abdur Rouf, Abu Sayed, Hossem MD Saim, Md. Nournabi Khan,
- Abstract summary: Smoking continues to be a major preventable cause of death worldwide.<n>Current medical screening methods often miss the early warning signs of smoking-related health problems.
- Score: 1.1288535170985818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smoking continues to be a major preventable cause of death worldwide, affecting millions through damage to the heart, metabolism, liver, and kidneys. However, current medical screening methods often miss the early warning signs of smoking-related health problems, leading to late-stage diagnoses when treatment options become limited. This study presents a systematic comparative evaluation of machine learning approaches for smoking-related health risk assessment, emphasizing clinical interpretability and practical deployment over algorithmic innovation. We analyzed health screening data from 55,691 individuals, examining various health indicators, including body measurements, blood tests, and demographic information. We tested three advanced prediction algorithms - Random Forest, XGBoost, and LightGBM - to determine which could most accurately identify people at high risk. This study employed a cross-sectional design to classify current smoking status based on health screening biomarkers, not to predict future disease development. Our Random Forest model performed best, achieving an Area Under the Curve (AUC) of 0.926, meaning it could reliably distinguish between high-risk and lower-risk individuals. Using SHAP (SHapley Additive exPlanations) analysis to understand what the model was detecting, we found that key health markers played crucial roles in prediction: blood pressure levels, triglyceride concentrations, liver enzyme readings, and kidney function indicators (serum creatinine) were the strongest signals of declining health in smokers.
Related papers
- When Curiosity Signals Danger: Predicting Health Crises Through Online Medication Inquiries [40.12543056558646]
This study introduces a novel annotated dataset of medication-related questions extracted from online forums.<n>Each entry is manually labelled for criticality based on clinical risk factors.<n>Results highlight the potential of classical and modern methods to support real-time triage and alert systems.
arXiv Detail & Related papers (2025-09-15T11:31:25Z) - Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis [0.0]
This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records.<n>We have performed statistical analyses, including t-tests, Chi-square tests, and ANOVA, to identify strong associations between CVD and elderly people.<n>A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associations for smoking and alcohol.
arXiv Detail & Related papers (2025-07-29T15:07:32Z) - Fuzzy Rule based Intelligent Cardiovascular Disease Prediction using Complex Event Processing [0.8668211481067458]
Cardiovascular disease (CVDs) is a rapidly rising global concern due to unhealthy diets, lack of physical activity, and other factors.
Recent research has focused on accurate and timely disease prediction to reduce risk and fatalities.
We propose a fuzzy rule-based system for monitoring clinical data to provide real-time decision support.
arXiv Detail & Related papers (2024-09-19T16:36:24Z) - A data balancing approach towards design of an expert system for Heart Disease Prediction [0.9895793818721335]
Heart disease is a serious global health issue that claims millions of lives every year.
We employed five machine learning methods in this paper: Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis, Extra TreeBoost, and AdaBoost.
The accuracy of the Random Forest and Decision Tree model was 99.83%.
arXiv Detail & Related papers (2024-07-26T08:56:13Z) - Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients [0.0]
This research developed an explainable machine learning system for predicting Chronic Kidney Disease (CKD) in patients with cardiovascular risks.
The Random Forest model achieved the highest sensitivity of 88.2%.
arXiv Detail & Related papers (2024-04-17T07:59:33Z) - Improving Cardiovascular Disease Prediction Through Comparative Analysis
of Machine Learning Models: A Case Study on Myocardial Infarction [0.0]
Cardiovascular disease remains a leading cause of mortality in the contemporary world.
Accurate predictions are pivotal for refining healthcare strategies.
XGBoost emerges as the top-performing model.
arXiv Detail & Related papers (2023-11-01T13:41:44Z) - Interpretable Survival Analysis for Heart Failure Risk Prediction [50.64739292687567]
We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models.
Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.
arXiv Detail & Related papers (2023-10-24T02:56:05Z) - Penalized Deep Partially Linear Cox Models with Application to CT Scans
of Lung Cancer Patients [42.09584755334577]
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective therapies.
The National Lung Screening Trial (NLST) employed computed tomography texture analysis to quantify the mortality risks of lung cancer patients.
We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model.
arXiv Detail & Related papers (2023-03-09T15:38:16Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Robustness to Spurious Correlations via Human Annotations [100.63051542531171]
We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality.
Specifically, we use human annotation to augment each training example with a potential unmeasured variable.
We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts.
arXiv Detail & Related papers (2020-07-13T20:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.