Related papers: Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods

Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods

URL: http://arxiv.org/abs/2510.22293v1
Date: Sat, 25 Oct 2025 13:36:18 GMT
Title: Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods
Authors: Mary E. An, Paul Griffin, Jonathan G. Stine, Ramakrishna Balakrishnan, Ram Sriram, Soundar Kumara,
Abstract summary: We developed a fair, rigorous, and reproducible MASLD prediction model.<n>MASLD affects 33% of U.S. adults and is the most common chronic liver disease.
Score: 0.8642326601683298
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. Early detection is important, as lifestyle interventions can prevent disease progression. We developed a fair, rigorous, and reproducible MASLD prediction model and compared it to prior methods using a large electronic health record database. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network for MASLD prediction using clinical feature subsets, including the top 10 SHAP-ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method. Results: This study included 59,492 patients in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: We developed the MASER prediction model (MASLD Static EHR Risk Prediction), a LASSO logistic regression model which achieved competitive performance for MASLD prediction (AUROC 0.836, accuracy 77.6%), comparable to previously reported ensemble and tree-based models. Overall, this approach demonstrates that interpretable models can achieve a balance of predictive performance and fairness in diverse patient populations.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
An Explainable and Fair AI Tool for PCOS Risk Assessment: Calibration, Subgroup Equity, and Interactive Clinical Deployment [0.10026496861838446]
This paper presents a fairness-audited and interpretable machine learning framework for predicting polycystic ovary syndrome (PCOS)<n>The framework integrated SHAP-based feature attributions with demographic audits to connect predictive explanations with observed disparities for actionable insights.<n>A Streamlit-based web interface enables real-time PCOS risk assessment, Rotterdam criteria evaluation, and interactive 'what-if' analysis.
arXiv Detail & Related papers (2025-11-08T16:14:56Z)
An Explainable AI-Enhanced Machine Learning Approach for Cardiovascular Disease Detection and Risk Assessment [0.0]
Heart disease remains a major global health concern.<n>Traditional diagnostic methods fail to accurately identify and manage heart disease risks.<n>Machine learning has the potential to significantly enhance the accuracy, efficiency, and speed of heart disease diagnosis.
arXiv Detail & Related papers (2025-07-15T10:38:38Z)
A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke [3.2906073576204955]
Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors.<n>We developed an interpretable multi-level stacking ensemble model for ischaemic and haemorrhagic stroke.<n>An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke.<n>Further validation is needed for haemorrhagic stroke.
arXiv Detail & Related papers (2025-05-30T01:08:26Z)
CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE) CRTRE applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Identifying and mitigating bias in algorithms used to manage patients in a pandemic [4.756860520861679]
Logistic regression models were created to predict COVID-19 mortality, ventilator status and inpatient status using a real-world dataset. Models showed a 57% decrease in the number of biased trials. After calibration, the average sensitivity of the predictive models increased from 0.527 to 0.955.
arXiv Detail & Related papers (2021-10-30T21:10:56Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan [49.209225484926634]
We propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification. Our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
arXiv Detail & Related papers (2020-05-07T12:16:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.