Amortized Variational Inference for Simple Hierarchical Models
- URL: http://arxiv.org/abs/2111.03144v1
- Date: Thu, 4 Nov 2021 20:29:12 GMT
- Title: Amortized Variational Inference for Simple Hierarchical Models
- Authors: Abhinav Agrawal, Justin Domke
- Abstract summary: It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset.
This paper suggests an amortized approach where shared parameters simultaneously represent all local distributions.
It is also dramatically faster than using a structured variational distribution.
- Score: 37.56550107432323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is difficult to use subsampling with variational inference in hierarchical
models since the number of local latent variables scales with the dataset.
Thus, inference in hierarchical models remains a challenge at large scale. It
is helpful to use a variational family with structure matching the posterior,
but optimization is still slow due to the huge number of local distributions.
Instead, this paper suggests an amortized approach where shared parameters
simultaneously represent all local distributions. This approach is similarly
accurate as using a given joint distribution (e.g., a full-rank Gaussian) but
is feasible on datasets that are several orders of magnitude larger. It is also
dramatically faster than using a structured variational distribution.
Related papers
- Flag Aggregator: Scalable Distributed Training under Failures and
Augmented Losses using Convex Optimization [14.732408788010313]
ML applications increasingly rely on complex deep learning models and large datasets.
To scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model.
With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems.
We show that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators.
arXiv Detail & Related papers (2023-02-12T06:38:30Z) - AdaCat: Adaptive Categorical Discretization for Autoregressive Models [84.85102013917606]
We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat)
AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest.
arXiv Detail & Related papers (2022-08-03T17:53:46Z) - Variational Inference with Locally Enhanced Bounds for Hierarchical
Models [38.73307745906571]
We propose a new family of variational bounds for hierarchical models based on the application of tightening methods.
We show that our approach naturally allows the use of subsampling to get unbiased gradients, and that it fully leverages the power of methods that build tighter lower bounds.
arXiv Detail & Related papers (2022-03-08T22:53:43Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Generative Model without Prior Distribution Matching [26.91643368299913]
Variational Autoencoder (VAE) and its variations are classic generative models by learning a low-dimensional latent representation to satisfy some prior distribution.
We propose to let the prior match the embedding distribution rather than imposing the latent variables to fit the prior.
arXiv Detail & Related papers (2020-09-23T09:33:24Z) - Variational Filtering with Copula Models for SLAM [5.242618356321224]
We show how it is possible to perform simultaneous localization and mapping (SLAM) with a larger class of distributions.
We integrate the distribution model with copulas into a Sequential Monte Carlo estimator and show how unknown model parameters can be learned through gradient-based optimization.
arXiv Detail & Related papers (2020-08-02T15:38:23Z) - Efficient Marginalization of Discrete and Structured Latent Variables
via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging.
One typically resorts to sampling-based approximations of the true marginal.
We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.