Related papers: Gradient Estimation with Stochastic Softmax Tricks

Gradient Estimation with Stochastic Softmax Tricks

URL: http://arxiv.org/abs/2006.08063v3
Date: Sun, 28 Feb 2021 23:43:23 GMT
Title: Gradient Estimation with Stochastic Softmax Tricks
Authors: Max B. Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison
Abstract summary: We introduce softmax tricks, which generalize the Gumbel-Softmax trick to spaces. We find that softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
Score: 84.68686389163153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. When compared to less structured baselines, we find that stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure.

Related papers

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z)
r-softmax: Generalized Softmax with Controllable Sparsity Rate [11.39524236962986]
We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
arXiv Detail & Related papers (2023-04-11T14:28:29Z)
Policy Gradient with Tree Expansion [72.10002936187388]
Policy gradient methods are notorious for having a large variance and high sample complexity.<n>We introduce SoftTreeMax -- a generalization of softmax that employs planning.<n>We show that SoftTreeMax reduces the gradient variance by three orders of magnitude.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
Semidefinite Relaxations for Robust Multiview Triangulation [53.360555898338106]
We extend existing relaxation approaches to non-robust multiview triangulation by incorporating a truncated least squares cost function. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.
arXiv Detail & Related papers (2023-01-26T21:31:32Z)
Optimal Sparse Regression Trees [24.03491277969824]
This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels.
arXiv Detail & Related papers (2022-11-28T00:43:21Z)
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces [4.829821142951709]
Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. Standard learning approach is to define a latent variable as an algorithm output and to use a differentiable surrogate for training. We extend the Gumbel-Max trick to define distributions over structured domains.
arXiv Detail & Related papers (2021-10-28T12:46:10Z)
Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data. In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z)
Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution. This approach poses a number of implementation and optimization challenges. We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging. One typically resorts to sampling-based approximations of the true marginal. We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z)
Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems. The mean rewards of arms satisfy some given structural constraints. We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.