Related papers: DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation

DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation

URL: http://arxiv.org/abs/2309.03974v1
Date: Thu, 7 Sep 2023 19:15:40 GMT
Title: DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation
Authors: Pau Mulet Arabi, Alec Flowers, Lukas Mauch, Fabien Cardinaux
Abstract summary: We introduce DBsurf, a reinforce-based estimator for discrete distributions. It uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking.
Score: 2.89784213091656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true probability distribution and the drawn samples, a common issue in low sampling regimes that results in inaccurate gradient estimates. In this paper, we introduce DBsurf, a reinforce-based estimator for discrete distributions that uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. To assess the performance of our estimator, we subject it to a diverse set of tasks. Among existing estimators, DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking. Furthermore, DBsurf achieves the best results for training variational auto-encoders (VAE) across different datasets and sampling setups. Finally, we apply DBsurf to build a simple and efficient Neural Architecture Search (NAS) algorithm with state-of-the-art performance.

Related papers

Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance [52.705112811734566]
A novel diffusion-based framework is introduced for solving inverse problems using a piecewise guidance scheme.<n>The proposed method is problem-agnostic and readily adaptable to a variety of inverse problems.<n>The framework achieves a reduction in inference time of (25%) for inpainting with both random and center masks, and (23%) and (24%) for (4times) and (8times) super-resolution tasks.
arXiv Detail & Related papers (2025-07-22T19:35:14Z)
BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z)
Distributionally Robust Coreset Selection under Covariate Shift [14.979902937312099]
Coreset selection is an approach to reducing training data. It is challenging to select an effective subset of training data that performs well across all deployment scenarios. We propose Distributionally Robust Coreset Selection (DRCS) for this purpose.
arXiv Detail & Related papers (2025-01-24T05:36:53Z)
Improving Distribution Alignment with Diversity-based Sampling [0.0]
Domain shifts are ubiquitous in machine learning, and can substantially degrade a model's performance when deployed to real-world data. This paper proposes to improve these estimates by inducing diversity in each sampled minibatch. It simultaneously balances the data and reduces the variance of the gradients, thereby enhancing the model's generalisation ability.
arXiv Detail & Related papers (2024-10-05T17:26:03Z)
Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors. Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z)
Entropy-MCMC: Sampling from Flat Basins with Ease [10.764160559530849]
We introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks.
arXiv Detail & Related papers (2023-10-09T04:40:20Z)
Bayesian Cramér-Rao Bound Estimation with Score-Based Models [3.4480437706804503]
The Bayesian Cram'er-Rao bound (CRB) provides a lower bound on the mean square error of any Bayesian estimator under mild regularity conditions. This work introduces a new data-driven estimator for the CRB using score matching.
arXiv Detail & Related papers (2023-09-28T00:22:21Z)
Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. A collapsed sample represents uncountably many models drawn from the approximate posterior. Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems. We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
Distributed Sketching Methods for Privacy Preserving Regression [54.51566432934556]
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems. We derive novel approximation guarantees for classical sketching methods and analyze the accuracy of parameter averaging for distributed sketches. We illustrate the performance of distributed sketches in a serverless computing platform with large scale experiments.
arXiv Detail & Related papers (2020-02-16T08:35:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.