Spectral neighbor joining for reconstruction of latent tree models
- URL: http://arxiv.org/abs/2002.12547v3
- Date: Wed, 23 Sep 2020 02:15:30 GMT
- Title: Spectral neighbor joining for reconstruction of latent tree models
- Authors: Ariel Jaffe, Noah Amsel, Yariv Aizenbud, Boaz Nadler, Joseph T. Chang,
Yuval Kluger
- Abstract summary: We develop Spectral Neighbor Joining, a novel method to recover the structure of latent tree graphical models.
We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix.
We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.
- Score: 5.229354894035374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common assumption in multiple scientific applications is that the
distribution of observed data can be modeled by a latent tree graphical model.
An important example is phylogenetics, where the tree models the evolutionary
lineages of a set of observed organisms. Given a set of independent
realizations of the random variables at the leaves of the tree, a key challenge
is to infer the underlying tree topology. In this work we develop Spectral
Neighbor Joining (SNJ), a novel method to recover the structure of latent tree
graphical models. Given a matrix that contains a measure of similarity between
all pairs of observed variables, SNJ computes a spectral measure of cohesion
between groups of observed variables. We prove that SNJ is consistent, and
derive a sufficient condition for correct tree recovery from an estimated
similarity matrix. Combining this condition with a concentration of measure
result on the similarity matrix, we bound the number of samples required to
recover the tree with high probability. We illustrate via extensive simulations
that in comparison to several other reconstruction methods, SNJ requires fewer
samples to accurately recover trees with a large number of leaves or long
edges.
Related papers
- Terminating Differentiable Tree Experts [77.2443883991608]
We propose a neuro-symbolic Differentiable Tree Machine that learns tree operations using a combination of transformers and Representation Products.
We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts.
We additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically.
arXiv Detail & Related papers (2024-07-02T08:45:38Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Tree Variational Autoencoders [5.992683455757179]
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables.
TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data.
arXiv Detail & Related papers (2023-06-15T09:25:04Z) - Sharded Bayesian Additive Regression Trees [1.4213973379473654]
We introduce a randomization auxiliary variable and a sharding tree to decide partitioning of data.
By observing that the optimal design of a sharding tree can determine optimal sharding for sub-models on a product space, we introduce an intersection tree structure to completely specify both the sharding and modeling using only tree structures.
arXiv Detail & Related papers (2023-06-01T05:41:31Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Learning Linear Polytree Structural Equation Models [4.833417613564028]
We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM)
We study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree.
We also consider an extension of group linear polytree models, in which each node represents a group of variables.
arXiv Detail & Related papers (2021-07-22T23:22:20Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.