Related papers: PhyloGFN: Phylogenetic inference with generative flow networks

PhyloGFN: Phylogenetic inference with generative flow networks

URL: http://arxiv.org/abs/2310.08774v2
Date: Mon, 25 Mar 2024 00:18:35 GMT
Title: PhyloGFN: Phylogenetic inference with generative flow networks
Authors: Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio,
Abstract summary: We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference. Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
Score: 57.104166650526416
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods. Our code is available at https://github.com/zmy1116/phylogfn.

Related papers

Flow Matching Meets Biology and Life Science: A Survey [65.2146737141455]
Flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling.<n>This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains.
arXiv Detail & Related papers (2025-07-23T17:44:29Z)
Variational phylogenetic inference with products over bipartitions [48.2982114295171]
We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density of the resulting distribution over trees. Our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable.
arXiv Detail & Related papers (2025-02-21T00:06:57Z)
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders [5.505257238864315]
PhyloVAE is an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. We develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. Experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies.
arXiv Detail & Related papers (2025-02-07T07:58:47Z)
PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation [50.80441546742053]
Phylogenetic trees elucidate evolutionary relationships among species. Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens. We propose PhyloGen, a novel method leveraging a pre-trained genomic language model.
arXiv Detail & Related papers (2024-12-25T08:33:05Z)
Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data. We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work. Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions [6.553961278427792]
We propose a more flexible family of branch length variational posteriors based on semi-implicit hierarchical distributions using graph neural networks. We show that this construction emits straightforward permutation equivariant distributions, and therefore can handle the non-Euclidean branch length space across different tree topologies with ease.
arXiv Detail & Related papers (2024-08-09T13:29:08Z)
ARTree: A Deep Autoregressive Model for Phylogenetic Inference [6.935130578959931]
We propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs) We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational phylogenetic inference problems.
arXiv Detail & Related papers (2023-10-14T10:26:03Z)
Improved Variational Bayesian Phylogenetic Inference using Mixtures [4.551386476350572]
VBPI-Mixtures is an algorithm designed to enhance the accuracy of phylogenetic posterior distributions. VBPI-Mixtures is capable of capturing distributions over tree-topologies that VBPI fails to model.
arXiv Detail & Related papers (2023-10-02T07:18:48Z)
GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies [0.3263412255491401]
We introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
arXiv Detail & Related papers (2023-07-07T15:45:05Z)
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z)
Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data. Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling. We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z)
Epigenetic evolution of deep convolutional models [81.21462458089142]
We build upon a previously proposed neuroevolution framework to evolve deep convolutional models. We propose a convolutional layer layout which allows kernels of different shapes and sizes to coexist within the same layer. The proposed layout enables the size and shape of individual kernels within a convolutional layer to be evolved with a corresponding new mutation operator.
arXiv Detail & Related papers (2021-04-12T12:45:16Z)
Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z)
Complexity-based speciation and genotype representation for neuroevolution [81.21462458089142]
This paper introduces a speciation principle for neuroevolution where evolving networks are grouped into species based on the number of hidden neurons. The proposed speciation principle is employed in several techniques designed to promote and preserve diversity within species and in the ecosystem as a whole.
arXiv Detail & Related papers (2020-10-11T06:26:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.