PhyloGFN: Phylogenetic inference with generative flow networks
- URL: http://arxiv.org/abs/2310.08774v2
- Date: Mon, 25 Mar 2024 00:18:35 GMT
- Title: PhyloGFN: Phylogenetic inference with generative flow networks
- Authors: Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio,
- Abstract summary: We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
- Score: 57.104166650526416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods. Our code is available at https://github.com/zmy1116/phylogfn.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions [6.553961278427792]
We propose a more flexible family of branch length variational posteriors based on semi-implicit hierarchical distributions using graph neural networks.
We show that this construction emits straightforward permutation equivariant distributions, and therefore can handle the non-Euclidean branch length space across different tree topologies with ease.
arXiv Detail & Related papers (2024-08-09T13:29:08Z) - ARTree: A Deep Autoregressive Model for Phylogenetic Inference [6.935130578959931]
We propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs)
We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational phylogenetic inference problems.
arXiv Detail & Related papers (2023-10-14T10:26:03Z) - Improved Variational Bayesian Phylogenetic Inference using Mixtures [4.551386476350572]
VBPI-Mixtures is an algorithm designed to enhance the accuracy of phylogenetic posterior distributions.
VBPI-Mixtures is capable of capturing distributions over tree-topologies that VBPI fails to model.
arXiv Detail & Related papers (2023-10-02T07:18:48Z) - GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of
Tree Topologies [0.3263412255491401]
We introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces.
In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
arXiv Detail & Related papers (2023-07-07T15:45:05Z) - DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with
GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function.
Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both.
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z) - Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data.
Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling.
We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z) - Epigenetic evolution of deep convolutional models [81.21462458089142]
We build upon a previously proposed neuroevolution framework to evolve deep convolutional models.
We propose a convolutional layer layout which allows kernels of different shapes and sizes to coexist within the same layer.
The proposed layout enables the size and shape of individual kernels within a convolutional layer to be evolved with a corresponding new mutation operator.
arXiv Detail & Related papers (2021-04-12T12:45:16Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z) - Complexity-based speciation and genotype representation for
neuroevolution [81.21462458089142]
This paper introduces a speciation principle for neuroevolution where evolving networks are grouped into species based on the number of hidden neurons.
The proposed speciation principle is employed in several techniques designed to promote and preserve diversity within species and in the ecosystem as a whole.
arXiv Detail & Related papers (2020-10-11T06:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.