Identifying Critical Phases for Disease Onset with Sparse Haematological Biomarkers
- URL: http://arxiv.org/abs/2503.14561v1
- Date: Tue, 18 Mar 2025 07:29:45 GMT
- Title: Identifying Critical Phases for Disease Onset with Sparse Haematological Biomarkers
- Authors: Andrea Zerio, Maya Bechler-Speicher, Tine Jess, Aleksejs Sazonovs,
- Abstract summary: Clinical blood tests are an emerging molecular data source for large-scale biomedical research.<n>Traditional imputation approaches distort learning signals and bias predictions while lacking biological interpretability.<n>We propose a novel methodology using Graph Neural Additive Networks (GNAN) to model delta biomarker trajectories.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Routinely collected clinical blood tests are an emerging molecular data source for large-scale biomedical research but inherently feature irregular sampling and informative observation. Traditional approaches rely on imputation, which can distort learning signals and bias predictions while lacking biological interpretability. We propose a novel methodology using Graph Neural Additive Networks (GNAN) to model biomarker trajectories as time-weighted directed graphs, where nodes represent sampling events and edges encode the time delta between events. GNAN's additive structure enables the explicit decomposition of feature and temporal contributions, allowing the detection of critical disease-associated time points. Unlike conventional imputation-based approaches, our method preserves the temporal structure of sparse data without introducing artificial biases and provides inherently interpretable predictions by decomposing contributions from each biomarker and time interval. This makes our model clinically applicable, as well as allowing it to discover biologically meaningful disease signatures.
Related papers
- Improving Diseases Predictions Utilizing External Bio-Banks [1.9336815376402723]
We demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations.
We train LightGBM models from scratch on our dataset (10K) to impute metabolomics features.
The imputed metabolomics features are then used in survival analysis to assess their impact on disease-related risk factors.
arXiv Detail & Related papers (2025-03-30T13:05:20Z) - Bayesian Cox model with graph-structured variable selection priors for multi-omics biomarker identification [0.0]
We propose a penalized semiparametric Bayesian Cox model with graph-structured selection priors for sparse identification of multi-omics features.<n>We show that the proposed model results in more trustable and stable variable selection and non-inferior survival prediction.<n>The proposed model is applied to the primary invasive breast cancer patients data in The Cancer Genome Atlas project.
arXiv Detail & Related papers (2025-03-17T11:33:21Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - Spatial Temporal Graph Convolution with Graph Structure Self-learning
for Early MCI Detection [9.11430195887347]
We propose a spatial temporal graph convolutional network with a novel graph structure self-learning mechanism for EMCI detection.
Results on the Alzheimer's Disease Neuroimaging Initiative database show that our method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-11T12:29:00Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.