A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models
- URL: http://arxiv.org/abs/2506.09076v3
- Date: Tue, 09 Sep 2025 11:09:17 GMT
- Title: A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models
- Authors: Haley Stone, Jing Du, Hao Xue, Matthew Scotch, David Heslop, Andreas Züfle, Chandini Raina MacIntyre, Flora Salim,
- Abstract summary: We propose a framework for inferring genetic distances between unsequenced cases and known alignment within defined transmission chains.<n>This approach is applied to highly pathogenic avian influenza A/H5 cases in wild birds in the United States.
- Score: 3.366423334813302
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pathogen genome data offers valuable structure for spatial models, but its utility is limited by incomplete sequencing coverage. We propose a probabilistic framework for inferring genetic distances between unsequenced cases and known sequences within defined transmission chains, using time-aware evolutionary distance modeling. The method estimates pairwise divergence from collection dates and observed genetic distances, enabling biologically plausible imputation grounded in observed divergence patterns, without requiring sequence alignment or known transmission chains. Applied to highly pathogenic avian influenza A/H5 cases in wild birds in the United States, this approach supports scalable, uncertainty-aware augmentation of genomic datasets and enhances the integration of evolutionary information into spatiotemporal modeling workflows.
Related papers
- Conditionally Site-Independent Neural Evolution of Antibody Sequences [5.267260830624825]
We introduce CoSiNE, a continuous-time Markov chain parameterized by a deep neural network.<n>We prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process.<n> Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction.
arXiv Detail & Related papers (2026-02-21T23:23:30Z) - PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling [9.456135223836181]
PRISM is a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis.<n>A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification.
arXiv Detail & Related papers (2026-02-12T00:55:31Z) - Overlap-weighted orthogonal meta-learner for treatment effect estimation over time [90.46786193198744]
We introduce a novel overlap-weighted meta-learner for estimating heterogeneous treatment effects (HTEs)<n>Our WO-learner has the favorable property of Neyman-orthogonality, meaning that it is robust against misspecification in the nuisance functions.<n>We show that our WO-learner is fully model-agnostic and can be applied to any machine learning model.
arXiv Detail & Related papers (2025-10-22T14:47:57Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - ProGen: Revisiting Probabilistic Spatial-Temporal Time Series Forecasting from a Continuous Generative Perspective Using Stochastic Differential Equations [18.64802090861607]
ProGen Pro provides a robust solution that effectively captures dependencies while managing uncertainty.
Our experiments on four benchmark traffic datasets demonstrate that ProGen Pro outperforms state-of-the-art deterministic probabilistic models.
arXiv Detail & Related papers (2024-11-02T14:37:30Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics [51.147876395589925]
A non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time.
A fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation.
Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance.
arXiv Detail & Related papers (2024-02-26T04:39:01Z) - Generalising sequence models for epigenome predictions with tissue and
assay embeddings [1.9999259391104391]
We show that strong correlation can be achieved across a large range of experimental conditions by integrating tissue and assay embeddings into a Contextualised Genomic Network (CGN)
We exhibit the efficacy of our approach across a broad set of epigenetic profiles and provide the first insights into the effect of genetic variants on epigenetic sequence model training.
arXiv Detail & Related papers (2023-08-22T10:34:19Z) - Nonlinear Permuted Granger Causality [0.6526824510982799]
Granger causal inference is a contentious but widespread method used in fields ranging from economics to neuroscience.
To allow for out-of-sample comparison, a measure of functional connectivity is explicitly defined using permutations of the covariate set.
Performance of the permutation method is compared to penalized variable selection, naive replacement, and omission techniques via simulation.
arXiv Detail & Related papers (2023-08-11T16:44:16Z) - Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity [25.488181126364186]
This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors.
We apply our method to grand biological challenges, such as data integration in single-cell genomics.
Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest.
arXiv Detail & Related papers (2023-07-02T12:52:41Z) - Unbalanced Diffusion Schr\"odinger Bridge [71.31485908125435]
We introduce unbalanced DSBs which model the temporal evolution of marginals with arbitrary finite mass.
This is achieved by deriving the time reversal of differential equations with killing and birth terms.
We present two novel algorithmic schemes that comprise a scalable objective function for training unbalanced DSBs.
arXiv Detail & Related papers (2023-06-15T12:51:56Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological
Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously.
STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations.
We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.