A Machine Learning Pipeline for Multiple Sclerosis Biomarker Discovery: Comparing explainable AI and Traditional Statistical Approaches
- URL: http://arxiv.org/abs/2509.22484v1
- Date: Fri, 26 Sep 2025 15:31:34 GMT
- Title: A Machine Learning Pipeline for Multiple Sclerosis Biomarker Discovery: Comparing explainable AI and Traditional Statistical Approaches
- Authors: Samuele Punzo, Silvia Giulia Galfrè, Francesco Massafra, Alessandro Maglione, Corrado Priami, Alina Sîrbu,
- Abstract summary: We present a machine learning pipeline for biomarker discovery in Multiple Sclerosis (MS)<n>After robust preprocessing we trained an XGBoost classifier optimized via Bayesian search.<n>Our comparison revealed both overlapping and unique biomarkers between SHAP and DEA, suggesting complementary strengths.<n>This study highlights the value of combining explainable AI (xAI) with traditional statistical methods to gain deeper insights into disease mechanism.
- Score: 35.18016233072556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a machine learning pipeline for biomarker discovery in Multiple Sclerosis (MS), integrating eight publicly available microarray datasets from Peripheral Blood Mononuclear Cells (PBMC). After robust preprocessing we trained an XGBoost classifier optimized via Bayesian search. SHapley Additive exPlanations (SHAP) were used to identify key features for model prediction, indicating thus possible biomarkers. These were compared with genes identified through classical Differential Expression Analysis (DEA). Our comparison revealed both overlapping and unique biomarkers between SHAP and DEA, suggesting complementary strengths. Enrichment analysis confirmed the biological relevance of SHAP-selected genes, linking them to pathways such as sphingolipid signaling, Th1/Th2/Th17 cell differentiation, and Epstein-Barr virus infection all known to be associated with MS. This study highlights the value of combining explainable AI (xAI) with traditional statistical methods to gain deeper insights into disease mechanism.
Related papers
- R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer's Disease Progression [63.97617759805451]
Early detection of Alzheimer's disease requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic susceptibility.<n>We introduce R-GenIMA, an interpretable multimodal large language model that couples a novel ROI-wise vision transformer with genetic prompting.<n>R-GenIMA achieves state-of-the-art performance in four-way classification across normal cognition, subjective memory concerns, mild cognitive impairment, and AD.
arXiv Detail & Related papers (2025-12-22T02:54:10Z) - An Interpretable Ensemble Framework for Multi-Omics Dementia Biomarker Discovery Under HDLSS Conditions [0.0]
We propose a novel ensemble approach combining Graph Attention Networks (GAT), MultiOmics Variational AutoEncoder (MOVE), Elastic-net sparse regression, and Storey's False Discovery Rate (FDR)<n>We evaluate performance using both simulated multi-omics data and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.<n>Our method demonstrates superior predictive accuracy, feature selection precision, and biological relevance.
arXiv Detail & Related papers (2025-09-04T15:20:13Z) - FoundBioNet: A Foundation-Based Model for IDH Genotyping of Glioma from Multi-Parametric MRI [1.4249472316161877]
We propose a Foundation-based Biomarker Network (FoundBioNet) to noninvasively predict IDH mutation status from multi-parametric MRI.<n>Our model was trained and validated on a diverse, multi-center cohort of 1705 glioma patients from six public datasets.<n>Our model achieved AUCs of 90.58%, 88.08%, 65.41%, and 80.31% on independent test sets from EGD, TCGA, Ivy GAP, RHUH, and UPenn.
arXiv Detail & Related papers (2025-08-09T00:08:10Z) - Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data [36.92842246372894]
Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) is a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples.<n>By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability.
arXiv Detail & Related papers (2025-03-29T02:14:05Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Prompting Whole Slide Image Based Genetic Biomarker Prediction [13.764676578911526]
We propose a whole slide image (WSI) based genetic biomarker prediction method via prompting techniques.
We leverage large language models to generate medical prompts that serve as prior knowledge in extracting instances associated with genetic biomarkers.
We adopt a coarse-to-fine approach to mine biomarker information within the tumor microenvironment.
arXiv Detail & Related papers (2024-06-26T11:05:46Z) - scBeacon: single-cell biomarker extraction via identifying paired cell
clusters across biological conditions with contrastive siamese networks [0.9591674293850556]
scBeacon is a framework built upon a deep contrastive siamese network.
scBeacon adeptly identifies matched cell populations across varied conditions.
Comprehensive evaluations validate scBeacon's superiority over existing single-cell differential gene analysis tools.
arXiv Detail & Related papers (2023-11-05T08:27:24Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.