Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant
- URL: http://arxiv.org/abs/2510.10952v1
- Date: Mon, 13 Oct 2025 03:04:10 GMT
- Title: Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant
- Authors: Xi Mao, Zhendong Wang, Jingyu Li, Lingchao Mao, Utibe Essien, Hairong Wang, Xuelei Sherry Ni,
- Abstract summary: Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible.<n>We predict cognitive performance from social determinants of health using the NIH NIA-supported PREPARE Challenge Phase 2 dataset.
- Score: 28.20784930277189
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes [55.310195121276074]
We propose a Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework to predict diseases.<n>It integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy.<n>It provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
arXiv Detail & Related papers (2025-12-09T05:37:54Z) - Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest [0.0]
This study evaluates three machine learning models -- Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF)<n>RF achieves the highest predictive accuracy ($R2 = 0.9423$), significantly outperforming LR and RDT.<n>These insights underscore the synergy between ensemble methods and transparency in addressing public-health challenges.
arXiv Detail & Related papers (2025-10-01T06:02:31Z) - A Comprehensive Review of Datasets for Clinical Mental Health AI Systems [55.67299586253951]
We present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants.<n>Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data.
arXiv Detail & Related papers (2025-08-13T13:42:35Z) - Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder [60.84344168388442]
Language-related functional magnetic resonance imaging (fMRI) may be a promising approach for detecting cognitive decline and early NCD.<n>We examined the effectiveness of this task among 97 non-demented Chinese older adults from Hong Kong.<n>The study demonstrated the potential of the naturalistic language-related fMRI task for early detection of aging-related cognitive decline and NCD.
arXiv Detail & Related papers (2025-06-10T16:58:47Z) - Comparative Analysis of Stroke Prediction Models Using Machine Learning [0.0]
Stroke remains one of the most critical global health challenges, ranking as the second leading cause of death and the third leading cause of disability worldwide.<n>This study explores the effectiveness of machine learning algorithms in predicting stroke risk using demographic, clinical, and lifestyle data from the Stroke Prediction dataset.
arXiv Detail & Related papers (2025-05-14T21:27:19Z) - Early Prediction of Alzheimer's and Related Dementias: A Machine Learning Approach Utilizing Social Determinants of Health Data [1.4140700984013321]
Alzheimer's disease and related dementias (AD/ADRD) represent a growing healthcare crisis affecting over 6 million Americans.<n>Social determinants of health (SDOH) significantly influence both the risk and progression of cognitive functioning.<n>This report examines how these social, environmental, and structural factors impact cognitive health trajectories.
arXiv Detail & Related papers (2025-03-20T03:16:02Z) - Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift [46.84912148188679]
Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings.<n>Existing approaches fail to address the unique challenges of survival analysis, such as censoring and the integration of discrete and continuous time.<n>We propose two novel methods for estimating target site-specific causal effects in multi-source settings.
arXiv Detail & Related papers (2025-01-30T23:21:25Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Bias Reducing Multitask Learning on Mental Health Prediction [18.32551434711739]
There has been an increase in research in developing machine learning models for mental health detection or prediction.
In this work, we aim to perform a fairness analysis and implement a multi-task learning based bias mitigation method on anxiety prediction models.
Our analysis showed that our anxiety prediction base model introduced some bias with regards to age, income, ethnicity, and whether a participant is born in the U.S. or not.
arXiv Detail & Related papers (2022-08-07T02:28:32Z) - Improving Prediction of Cognitive Performance using Deep Neural Networks
in Sparse Data [2.867517731896504]
We used data from an observational, cohort study, Midlife in the United States (MIDUS) to model executive function and episodic memory measures.
Deep neural network (DNN) models consistently ranked highest in all of the cognitive performance prediction tasks.
arXiv Detail & Related papers (2021-12-28T22:23:08Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.