Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data
- URL: http://arxiv.org/abs/2510.02835v1
- Date: Fri, 03 Oct 2025 09:17:57 GMT
- Title: Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data
- Authors: Dohyun Bu, Jisoo Han, Soohwa Kwon, Yulim So, Jong-Seok Lee,
- Abstract summary: SASL is an interpretable modeling approach explicitly designed for personalized health prediction.<n>We develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets.<n>For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating.
- Score: 18.017666750186336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Improved prediction of personalized health outcomes -- such as sleep quality and stress -- from multimodal lifelog data could have meaningful clinical and practical implications. However, state-of-the-art models, primarily deep neural networks and gradient-boosted ensembles, sacrifice interpretability and fail to adequately address the significant inter-individual variability inherent in lifelog data. To overcome these challenges, we propose the Subject-Adaptive Sparse Linear (SASL) framework, an interpretable modeling approach explicitly designed for personalized health prediction. SASL integrates ordinary least squares regression with subject-specific interactions, systematically distinguishing global from individual-level effects. We employ an iterative backward feature elimination method based on nested $F$-tests to construct a sparse and statistically robust model. Additionally, recognizing that health outcomes often represent discretized versions of continuous processes, we develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets. For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating, enhancing accuracy without compromising interpretability. Evaluations conducted on the CH-2025 dataset -- which comprises roughly 450 daily observations from ten subjects -- demonstrate that the hybrid SASL-LightGBM framework achieves predictive performance comparable to that of sophisticated black-box methods, but with significantly fewer parameters and substantially greater transparency, thus providing clear and actionable insights for clinicians and practitioners.
Related papers
- Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models [2.1127261244588156]
We propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks.<n>This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties.<n>We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models.
arXiv Detail & Related papers (2026-02-11T20:47:30Z) - Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Quantum Machine Learning for Predicting Anastomotic Leak: A Clinical Study [0.16777183511743468]
Anastomotic leak (AL) is a life-threatening complication following colorectal surgery.<n>This study explores the potential of Quantum Neural Networks (QNNs) for AL prediction.
arXiv Detail & Related papers (2025-06-02T14:13:10Z) - Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data [5.591260685112265]
SCORE is a semi-supervised representation learning framework that captures multi-domain disease profiles through patient embeddings.<n>To handle the computational challenges of large-scale data, it introduces a hybrid Expectation-Maximization (EM) and Gaussian Variational Approximation (GVA) algorithm.<n>Our analysis shows that incorporating unlabeled data enhances accuracy and reduces sensitivity to label scarcity.
arXiv Detail & Related papers (2025-05-27T05:20:17Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - Uncovering Population PK Covariates from VAE-Generated Latent Spaces [0.24578723416255746]
We propose a data-driven, model-free framework that integrates Variational Autoencoders (VAEs) deep learning model and LASSO regression.<n>VAE compresses high-dimensional PK signals into a structured latent space, achieving accurate reconstruction with a mean absolute percentage error (MAPE) of 2.26%.
arXiv Detail & Related papers (2025-05-05T09:47:39Z) - Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics [8.692647930497936]
We use conformal analysis to quantify the predictive uncertainty of a vision transformer based foundation model.<n>We show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model.
arXiv Detail & Related papers (2025-03-31T08:06:00Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Causal Inference via Nonlinear Variable Decorrelation for Healthcare
Applications [60.26261850082012]
We introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding.
We employ association rules as new representations using association rule mining based on the original features to increase model interpretability.
arXiv Detail & Related papers (2022-09-29T17:44:14Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.