Mixtures of All Trees
- URL: http://arxiv.org/abs/2302.14202v2
- Date: Wed, 29 Mar 2023 07:27:28 GMT
- Title: Mixtures of All Trees
- Authors: Nikil Roashan Selvam, Honghua Zhang, Guy Van den Broeck
- Abstract summary: We propose a novel class of generative models called mixtures of all trees: that is, a mixture over all possible ($nn-2$) tree-shaped graphical models over $n$ variables.
We show that it is possible to parameterize this Mixture of All Trees (MoAT) model compactly in a way that allows for tractable likelihood and optimization via gradient descent.
- Score: 28.972995038976745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree-shaped graphical models are widely used for their tractability. However,
they unfortunately lack expressive power as they require committing to a
particular sparse dependency structure. We propose a novel class of generative
models called mixtures of all trees: that is, a mixture over all possible
($n^{n-2}$) tree-shaped graphical models over $n$ variables. We show that it is
possible to parameterize this Mixture of All Trees (MoAT) model compactly
(using a polynomial-size representation) in a way that allows for tractable
likelihood computation and optimization via stochastic gradient descent.
Furthermore, by leveraging the tractability of tree-shaped models, we devise
fast-converging conditional sampling algorithms for approximate inference, even
though our theoretical analysis suggests that exact computation of marginals in
the MoAT model is NP-hard. Empirically, MoAT achieves state-of-the-art
performance on density estimation benchmarks when compared against powerful
probabilistic models including hidden Chow-Liu Trees.
Related papers
- Conditional Density Estimation with Histogram Trees [3.5297361401370044]
Conditional density estimation (CDE) goes beyond regression by modeling the full conditional distribution.
Current methods typically employ kernel-based approaches, using kernel functions directly for kernel density estimation or as basis functions in linear models.
We propose the Conditional Density Tree (CDTree), a fully non-parametric model consisting of a decision tree in which each leaf is formed by a histogram model.
arXiv Detail & Related papers (2024-10-15T09:53:24Z) - On marginal feature attributions of tree-based models [0.11184789007828977]
Local feature attributions based on marginal expectations, e.g. marginal Shapley, Owen or Banzhaf values, may be employed.
We present two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features.
We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models.
arXiv Detail & Related papers (2023-02-16T17:18:03Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Decentralized Learning of Tree-Structured Gaussian Graphical Models from
Noisy Data [1.52292571922932]
This paper studies the decentralized learning of tree-structured Gaussian graphical models (GGMs) from noisy data.
GGMs are widely used to model complex networks such as gene regulatory networks and social networks.
The proposed decentralized learning uses the Chow-Liu algorithm for estimating the tree-structured GGM.
arXiv Detail & Related papers (2021-09-22T10:41:18Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Tree-AMP: Compositional Inference with Tree Approximate Message Passing [23.509275850721778]
Tree-AMP is a python package for compositional inference in high-dimensional tree-structured models.
The package provides a unifying framework to study several approximate message passing algorithms.
arXiv Detail & Related papers (2020-04-03T13:51:10Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.