VaiPhy: a Variational Inference Based Algorithm for Phylogeny
- URL: http://arxiv.org/abs/2203.01121v1
- Date: Tue, 1 Mar 2022 13:55:56 GMT
- Title: VaiPhy: a Variational Inference Based Algorithm for Phylogeny
- Authors: Hazal Koptagel, Oskar Kviman, Harald Melin, Negar Safinianaini, Jens
Lagergren
- Abstract summary: We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space.
VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data, and is considerably faster since it does not require auto-differentiation.
- Score: 2.2499166814992435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Phylogenetics is a classical methodology in computational biology that today
has become highly relevant for medical investigation of single-cell data, e.g.,
in the context of development of cancer. The exponential size of the tree space
is unfortunately a formidable obstacle for current Bayesian phylogenetic
inference using Markov chain Monte Carlo based methods since these rely on
local operations. And although more recent variational inference (VI) based
methods offer speed improvements, they rely on expensive auto-differentiation
operations for learning the variational parameters. We propose VaiPhy, a
remarkably fast VI based algorithm for approximate posterior inference in an
augmented tree space. VaiPhy produces marginal log-likelihood estimates on par
with the state-of-the-art methods on real data, and is considerably faster
since it does not require auto-differentiation. Instead, VaiPhy combines
coordinate ascent update equations with two novel sampling schemes: (i)
SLANTIS, a proposal distribution for tree topologies in the augmented tree
space, and (ii) the JC sampler, the, to the best of our knowledge, first ever
scheme for sampling branch lengths directly from the popular Jukes-Cantor
model. We compare VaiPhy in terms of density estimation and runtime.
Additionally, we evaluate the reproducibility of the baselines. We provide our
code on GitHub: https://github.com/Lagergren-Lab/VaiPhy.
Related papers
- PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Resource saving taxonomy classification with k-mer distributions and
machine learning [2.0196229393131726]
We propose to use $k$-mer distributions obtained from DNA as features to classify its taxonomic origin.
We show that our approach improves the classification on the genus level and achieves comparable results for the superkingdom and phylum level.
arXiv Detail & Related papers (2023-03-10T08:01:08Z) - Bayesian Decision Trees via Tractable Priors and Probabilistic
Context-Free Grammars [7.259767735431625]
We propose a new criterion for training Bayesian Decision Trees.
BCART-PCFG can efficiently sample decision trees from a posterior distribution across trees given the data.
We find that trees sampled via BCART-PCFG perform comparable to or better than greedily-constructed Decision Trees.
arXiv Detail & Related papers (2023-02-15T00:17:41Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Stochastic tree ensembles for regularized nonlinear regression [0.913755431537592]
This paper develops a novel tree ensemble method for nonlinear regression, which we refer to as XBART.
By combining regularization and search strategies from Bayesian modeling with computationally efficient techniques, the new method attains state-of-the-art performance.
arXiv Detail & Related papers (2020-02-09T14:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.