Leveraging Recursive Gumbel-Max Trick for Approximate Inference in
Combinatorial Spaces
- URL: http://arxiv.org/abs/2110.15072v1
- Date: Thu, 28 Oct 2021 12:46:10 GMT
- Title: Leveraging Recursive Gumbel-Max Trick for Approximate Inference in
Combinatorial Spaces
- Authors: Kirill Struminsky, Artyom Gadetsky, Denis Rakitin, Danil Karpushkin,
Dmitry Vetrov
- Abstract summary: Structured latent variables allow incorporating meaningful prior knowledge into deep learning models.
Standard learning approach is to define a latent variable as an algorithm output and to use a differentiable surrogate for training.
We extend the Gumbel-Max trick to define distributions over structured domains.
- Score: 4.829821142951709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured latent variables allow incorporating meaningful prior knowledge
into deep learning models. However, learning with such variables remains
challenging because of their discrete nature. Nowadays, the standard learning
approach is to define a latent variable as a perturbed algorithm output and to
use a differentiable surrogate for training. In general, the surrogate puts
additional constraints on the model and inevitably leads to biased gradients.
To alleviate these shortcomings, we extend the Gumbel-Max trick to define
distributions over structured domains. We avoid the differentiable surrogates
by leveraging the score function estimators for optimization. In particular, we
highlight a family of recursive algorithms with a common feature we call
stochastic invariant. The feature allows us to construct reliable gradient
estimates and control variates without additional constraints on the model. In
our experiments, we consider various structured latent variable models and
achieve results competitive with relaxation-based counterparts.
Related papers
- Learning Discrete Latent Variable Structures with Tensor Rank Conditions [30.292492090200984]
Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns.
Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures.
We explore a tensor rank condition on contingency tables for an observed variable set $mathbfX_p$, showing that the rank is determined by the minimum support of a specific conditional set.
One can locate the latent variable through probing the rank on different observed variables
arXiv Detail & Related papers (2024-06-11T07:25:17Z) - Learning Sparsity of Representations with Discrete Latent Variables [15.05207849434673]
We propose a sparse deep latent generative model SDLGM to explicitly model degree of sparsity.
The resulting sparsity of a representation is not fixed, but fits to the observation itself under the pre-defined restriction.
For inference and learning, we develop an amortized variational method based on MC gradient estimator.
arXiv Detail & Related papers (2023-04-03T12:47:18Z) - GFlowNet-EM for learning compositional latent variable models [115.96660869630227]
A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization.
We propose the use of GFlowNets, algorithms for sampling from an unnormalized density.
By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational algorithms.
arXiv Detail & Related papers (2023-02-13T18:24:21Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - A Variational Inference Approach to Inverse Problems with Gamma
Hyperpriors [60.489902135153415]
This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors.
The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement.
arXiv Detail & Related papers (2021-11-26T06:33:29Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z) - Efficient Marginalization of Discrete and Structured Latent Variables
via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging.
One typically resorts to sampling-based approximations of the true marginal.
We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z) - Differentiable Segmentation of Sequences [2.1485350418225244]
We build on advances in learning continuous warping functions and propose a novel family of warping functions based on the two-sided power (TSP) distribution.
Our formulation includes the important class of segmented generalized linear models as a special case.
We use our approach to model the spread of COVID-19 with Poisson regression, apply it on a change point detection task, and learn classification models with concept drift.
arXiv Detail & Related papers (2020-06-23T15:51:48Z) - Gradient Estimation with Stochastic Softmax Tricks [84.68686389163153]
We introduce softmax tricks, which generalize the Gumbel-Softmax trick to spaces.
We find that softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
arXiv Detail & Related papers (2020-06-15T00:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.