Tractable Regularization of Probabilistic Circuits
- URL: http://arxiv.org/abs/2106.02264v1
- Date: Fri, 4 Jun 2021 05:11:13 GMT
- Title: Tractable Regularization of Probabilistic Circuits
- Authors: Anji Liu and Guy Van den Broeck
- Abstract summary: Probabilistic Circuits (PCs) are a promising avenue for probabilistic modeling.
We propose two intuitive techniques, data softening and entropy regularization, that take advantage of PCs' tractability.
We show that both methods consistently improve the generalization performance of a wide variety of PCs.
- Score: 31.841838579553034
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Probabilistic Circuits (PCs) are a promising avenue for probabilistic
modeling. They combine advantages of probabilistic graphical models (PGMs) with
those of neural networks (NNs). Crucially, however, they are tractable
probabilistic models, supporting efficient and exact computation of many
probabilistic inference queries, such as marginals and MAP. Further, since PCs
are structured computation graphs, they can take advantage of
deep-learning-style parameter updates, which greatly improves their
scalability. However, this innovation also makes PCs prone to overfitting,
which has been observed in many standard benchmarks. Despite the existence of
abundant regularization techniques for both PGMs and NNs, they are not
effective enough when applied to PCs. Instead, we re-think regularization for
PCs and propose two intuitive techniques, data softening and entropy
regularization, that both take advantage of PCs' tractability and still have an
efficient implementation as a computation graph. Specifically, data softening
provides a principled way to add uncertainty in datasets in closed form, which
implicitly regularizes PC parameters. To learn parameters from a softened
dataset, PCs only need linear time by virtue of their tractability. In entropy
regularization, the exact entropy of the distribution encoded by a PC can be
regularized directly, which is again infeasible for most other density
estimation models. We show that both methods consistently improve the
generalization performance of a wide variety of PCs. Moreover, when paired with
a simple PC structure, we achieved state-of-the-art results on 10 out of 20
standard discrete density estimation benchmarks.
Related papers
- Probabilistically Plausible Counterfactual Explanations with Normalizing Flows [2.675793767640172]
We present PPCEF, a novel method for generating probabilistically plausible counterfactual explanations.
Our method enforces plausibility by directly optimizing the explicit density function without assuming a particular family of parametrized distributions.
PPCEF is a powerful tool for interpreting machine learning models and for improving fairness, accountability, and trust in AI systems.
arXiv Detail & Related papers (2024-05-27T20:24:03Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - HyperSPNs: Compact and Expressive Probabilistic Circuits [89.897635970366]
HyperSPNs is a new paradigm of generating the mixture weights of large PCs using a small-scale neural network.
We show the merits of our regularization strategy on two state-of-the-art PC families introduced in recent literature.
arXiv Detail & Related papers (2021-12-02T01:24:43Z) - Merging Two Cultures: Deep and Statistical Learning [3.15863303008255]
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data.
We show that prediction, optimisation and uncertainty can be achieved using probabilistic methods at the output layer of the model.
arXiv Detail & Related papers (2021-10-22T02:57:21Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local
Cross-Validation [1.2233362977312945]
We present MuyGPs, a novel efficient GP hyper parameter estimation method.
MuyGPs builds upon prior methods that take advantage of the nearest neighbors structure of the data.
We show that our method outperforms all known competitors both in terms of time-to-solution and the root mean squared error of the predictions.
arXiv Detail & Related papers (2021-04-29T18:10:21Z) - Probabilistic Generating Circuits [50.98473654244851]
We propose probabilistic generating circuits (PGCs) for their efficient representation.
PGCs are not just a theoretical framework that unifies vastly different existing models, but also show huge potential in modeling realistic data.
We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks.
arXiv Detail & Related papers (2021-02-19T07:06:53Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.