Sparse Probabilistic Circuits via Pruning and Growing
- URL: http://arxiv.org/abs/2211.12551v1
- Date: Tue, 22 Nov 2022 19:54:52 GMT
- Title: Sparse Probabilistic Circuits via Pruning and Growing
- Authors: Meihua Dang, Anji Liu, Guy Van den Broeck
- Abstract summary: Probabilistic circuits (PCs) are a tractable representation of probability distributions allowing for exact and efficient computation of likelihoods and marginals.
We propose two operations: pruning and growing, that exploit the sparsity of PC structures.
By alternatingly applying pruning and growing, we increase the capacity that is meaningfully used, allowing us to significantly scale up PC learning.
- Score: 30.777764474107663
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Probabilistic circuits (PCs) are a tractable representation of probability
distributions allowing for exact and efficient computation of likelihoods and
marginals. There has been significant recent progress on improving the scale
and expressiveness of PCs. However, PC training performance plateaus as model
size increases. We discover that most capacity in existing large PC structures
is wasted: fully-connected parameter layers are only sparsely used. We propose
two operations: pruning and growing, that exploit the sparsity of PC
structures. Specifically, the pruning operation removes unimportant
sub-networks of the PC for model compression and comes with theoretical
guarantees. The growing operation increases model capacity by increasing the
size of the latent space. By alternatingly applying pruning and growing, we
increase the capacity that is meaningfully used, allowing us to significantly
scale up PC learning. Empirically, our learner achieves state-of-the-art
likelihoods on MNIST-family image datasets and on Penn Tree Bank language data
compared to other PC learners and less tractable deep generative models such as
flow-based models and variational autoencoders (VAEs).
Related papers
- Scaling Tractable Probabilistic Circuits: A Systems Perspective [36.528534612003504]
PyJuice is a general implementation design for PCs that improves prior art in several regards.
It is 1-2 orders of magnitude faster than existing systems at training large-scale PCs.
PyJuice consumes 2-5x less memory, which enables us to train larger models.
arXiv Detail & Related papers (2024-06-02T14:57:00Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Understanding the Distillation Process from Deep Generative Models to
Tractable Probabilistic Circuits [30.663322946413285]
We theoretically and empirically discover that the performance of a PC can exceed that of its teacher model.
In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models.
arXiv Detail & Related papers (2023-02-16T04:52:46Z) - Scaling Up Probabilistic Circuits by Latent Variable Distillation [29.83240905570575]
As the number of parameters in PCs increases, their performance immediately plateaus.
We leverage the less tractable but more expressive deep generative models to provide extra supervision over the latent variables of PCs.
In particular, on the image modeling benchmarks, PCs achieve competitive performance against some of the widely-used deep generative models.
arXiv Detail & Related papers (2022-10-10T02:07:32Z) - HyperSPNs: Compact and Expressive Probabilistic Circuits [89.897635970366]
HyperSPNs is a new paradigm of generating the mixture weights of large PCs using a small-scale neural network.
We show the merits of our regularization strategy on two state-of-the-art PC families introduced in recent literature.
arXiv Detail & Related papers (2021-12-02T01:24:43Z) - M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
Parameter Pretraining [55.16088793437898]
Training extreme-scale models requires enormous amounts of computes and memory footprint.
We propose a simple training strategy called "Pseudo-to-Real" for high-memory-footprint-required large models.
arXiv Detail & Related papers (2021-10-08T04:24:51Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.