Gradient Estimation with Stochastic Softmax Tricks
- URL: http://arxiv.org/abs/2006.08063v3
- Date: Sun, 28 Feb 2021 23:43:23 GMT
- Title: Gradient Estimation with Stochastic Softmax Tricks
- Authors: Max B. Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J.
Maddison
- Abstract summary: We introduce softmax tricks, which generalize the Gumbel-Softmax trick to spaces.
We find that softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
- Score: 84.68686389163153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Gumbel-Max trick is the basis of many relaxed gradient estimators. These
estimators are easy to implement and low variance, but the goal of scaling them
comprehensively to large combinatorial distributions is still outstanding.
Working within the perturbation model framework, we introduce stochastic
softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial
spaces. Our framework is a unified perspective on existing relaxed estimators
for perturbation models, and it contains many novel relaxations. We design
structured relaxations for subset selection, spanning trees, arborescences, and
others. When compared to less structured baselines, we find that stochastic
softmax tricks can be used to train latent variable models that perform better
and discover more latent structure.
Related papers
- On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - r-softmax: Generalized Softmax with Controllable Sparsity Rate [11.39524236962986]
We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate.
We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
arXiv Detail & Related papers (2023-04-11T14:28:29Z) - Semidefinite Relaxations for Robust Multiview Triangulation [53.360555898338106]
We extend existing relaxation approaches to non-robust multiview triangulation by incorporating a truncated least squares cost function.
We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.
arXiv Detail & Related papers (2023-01-26T21:31:32Z) - Optimal Sparse Regression Trees [24.03491277969824]
This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees.
We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels.
arXiv Detail & Related papers (2022-11-28T00:43:21Z) - Leveraging Recursive Gumbel-Max Trick for Approximate Inference in
Combinatorial Spaces [4.829821142951709]
Structured latent variables allow incorporating meaningful prior knowledge into deep learning models.
Standard learning approach is to define a latent variable as an algorithm output and to use a differentiable surrogate for training.
We extend the Gumbel-Max trick to define distributions over structured domains.
arXiv Detail & Related papers (2021-10-28T12:46:10Z) - Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data.
In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Efficient Marginalization of Discrete and Structured Latent Variables
via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging.
One typically resorts to sampling-based approximations of the true marginal.
We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.