DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient
Estimation
- URL: http://arxiv.org/abs/2309.03974v1
- Date: Thu, 7 Sep 2023 19:15:40 GMT
- Title: DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient
Estimation
- Authors: Pau Mulet Arabi, Alec Flowers, Lukas Mauch, Fabien Cardinaux
- Abstract summary: We introduce DBsurf, a reinforce-based estimator for discrete distributions.
It uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution.
DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking.
- Score: 2.89784213091656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computing gradients of an expectation with respect to the distributional
parameters of a discrete distribution is a problem arising in many fields of
science and engineering. Typically, this problem is tackled using Reinforce,
which frames the problem of gradient estimation as a Monte Carlo simulation.
Unfortunately, the Reinforce estimator is especially sensitive to discrepancies
between the true probability distribution and the drawn samples, a common issue
in low sampling regimes that results in inaccurate gradient estimates. In this
paper, we introduce DBsurf, a reinforce-based estimator for discrete
distributions that uses a novel sampling procedure to reduce the discrepancy
between the samples and the actual distribution. To assess the performance of
our estimator, we subject it to a diverse set of tasks. Among existing
estimators, DBsurf attains the lowest variance in a least squares problem
commonly used in the literature for benchmarking. Furthermore, DBsurf achieves
the best results for training variational auto-encoders (VAE) across different
datasets and sampling setups. Finally, we apply DBsurf to build a simple and
efficient Neural Architecture Search (NAS) algorithm with state-of-the-art
performance.
Related papers
- Improving Distribution Alignment with Diversity-based Sampling [0.0]
Domain shifts are ubiquitous in machine learning, and can substantially degrade a model's performance when deployed to real-world data.
This paper proposes to improve these estimates by inducing diversity in each sampled minibatch.
It simultaneously balances the data and reduces the variance of the gradients, thereby enhancing the model's generalisation ability.
arXiv Detail & Related papers (2024-10-05T17:26:03Z) - Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - Entropy-MCMC: Sampling from Flat Basins with Ease [10.764160559530849]
We introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins.
By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead.
Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks.
arXiv Detail & Related papers (2023-10-09T04:40:20Z) - Bayesian Cramér-Rao Bound Estimation with Score-Based Models [3.4480437706804503]
The Bayesian Cram'er-Rao bound (CRB) provides a lower bound on the mean square error of any Bayesian estimator under mild regularity conditions.
This work introduces a new data-driven estimator for the CRB using score matching.
arXiv Detail & Related papers (2023-09-28T00:22:21Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Distributed Sketching Methods for Privacy Preserving Regression [54.51566432934556]
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
We derive novel approximation guarantees for classical sketching methods and analyze the accuracy of parameter averaging for distributed sketches.
We illustrate the performance of distributed sketches in a serverless computing platform with large scale experiments.
arXiv Detail & Related papers (2020-02-16T08:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.