Scaling Up Probabilistic Circuits by Latent Variable Distillation
- URL: http://arxiv.org/abs/2210.04398v1
- Date: Mon, 10 Oct 2022 02:07:32 GMT
- Title: Scaling Up Probabilistic Circuits by Latent Variable Distillation
- Authors: Anji Liu and Honghua Zhang and Guy Van den Broeck
- Abstract summary: As the number of parameters in PCs increases, their performance immediately plateaus.
We leverage the less tractable but more expressive deep generative models to provide extra supervision over the latent variables of PCs.
In particular, on the image modeling benchmarks, PCs achieve competitive performance against some of the widely-used deep generative models.
- Score: 29.83240905570575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilistic Circuits (PCs) are a unified framework for tractable
probabilistic models that support efficient computation of various
probabilistic queries (e.g., marginal probabilities). One key challenge is to
scale PCs to model large and high-dimensional real-world datasets: we observe
that as the number of parameters in PCs increases, their performance
immediately plateaus. This phenomenon suggests that the existing optimizers
fail to exploit the full expressive power of large PCs. We propose to overcome
such bottleneck by latent variable distillation: we leverage the less tractable
but more expressive deep generative models to provide extra supervision over
the latent variables of PCs. Specifically, we extract information from
Transformer-based generative models to assign values to latent variables of
PCs, providing guidance to PC optimizers. Experiments on both image and
language modeling benchmarks (e.g., ImageNet and WikiText-2) show that latent
variable distillation substantially boosts the performance of large PCs
compared to their counterparts without latent variable distillation. In
particular, on the image modeling benchmarks, PCs achieve competitive
performance against some of the widely-used deep generative models, including
variational autoencoders and flow-based models, opening up new avenues for
tractable generative modeling.
Related papers
- Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers [3.2492319522383717]
Contrastive Language-Image Pre-training (CLIP) has attracted a surge of attention for its superior zero-shot performance and excellent transferability to downstream tasks.
However, training such large-scale models usually requires substantial computation and storage, which poses barriers for general users with consumer-level computers.
arXiv Detail & Related papers (2024-11-22T08:17:46Z) - Sum of Squares Circuits [8.323409122604893]
Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically.
We show that squared PCs encoding subtractive mixtures via negative parameters can be exponentially more expressive than monotonic PCs.
We formalize a novel class of PCs -- sum of squares PCs -- that can be exponentially more expressive than both squared and monotonic PCs.
arXiv Detail & Related papers (2024-08-21T17:08:05Z) - Understanding the Distillation Process from Deep Generative Models to
Tractable Probabilistic Circuits [30.663322946413285]
We theoretically and empirically discover that the performance of a PC can exceed that of its teacher model.
In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models.
arXiv Detail & Related papers (2023-02-16T04:52:46Z) - Sparse Probabilistic Circuits via Pruning and Growing [30.777764474107663]
Probabilistic circuits (PCs) are a tractable representation of probability distributions allowing for exact and efficient computation of likelihoods and marginals.
We propose two operations: pruning and growing, that exploit the sparsity of PC structures.
By alternatingly applying pruning and growing, we increase the capacity that is meaningfully used, allowing us to significantly scale up PC learning.
arXiv Detail & Related papers (2022-11-22T19:54:52Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - Learning Generative Vision Transformer with Energy-Based Latent Space
for Saliency Prediction [51.80191416661064]
We propose a novel vision transformer with latent variables following an informative energy-based prior for salient object detection.
Both the vision transformer network and the energy-based prior model are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation.
With the generative vision transformer, we can easily obtain a pixel-wise uncertainty map from an image, which indicates the model confidence in predicting saliency from the image.
arXiv Detail & Related papers (2021-12-27T06:04:33Z) - HyperSPNs: Compact and Expressive Probabilistic Circuits [89.897635970366]
HyperSPNs is a new paradigm of generating the mixture weights of large PCs using a small-scale neural network.
We show the merits of our regularization strategy on two state-of-the-art PC families introduced in recent literature.
arXiv Detail & Related papers (2021-12-02T01:24:43Z) - Probabilistic Generating Circuits [50.98473654244851]
We propose probabilistic generating circuits (PGCs) for their efficient representation.
PGCs are not just a theoretical framework that unifies vastly different existing models, but also show huge potential in modeling realistic data.
We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks.
arXiv Detail & Related papers (2021-02-19T07:06:53Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.