Related papers: Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling

Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling

URL: http://arxiv.org/abs/2006.05087v1
Date: Tue, 9 Jun 2020 07:31:21 GMT
Title: Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling
Authors: Giulio Franzese, Rosa Candela, Dimitrios Milios, Maurizio Filippone, Pietro Michiardi
Abstract summary: This work defines a unified mathematical framework to deepen our understanding of the role of gradient (SG) noise on the behavior of Markov chain Monte Carlo (SGMCMC) algorithms. Our formulation unlocks the design of a novel, practical approach to posterior sampling, which makes the SG noise isotropic using a fixed learning rate. Our proposal is competitive with the state-of-the-art on sgmcmc, while being much more practical to use.
Score: 18.64160180251004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we define a unified mathematical framework to deepen our understanding of the role of stochastic gradient (SG) noise on the behavior of Markov chain Monte Carlo sampling (SGMCMC) algorithms. Our formulation unlocks the design of a novel, practical approach to posterior sampling, which makes the SG noise isotropic using a fixed learning rate that we determine analytically, and that requires weaker assumptions than existing algorithms. In contrast, the common traits of existing \sgmcmc algorithms is to approximate the isotropy condition either by drowning the gradients in additive noise (annealing the learning rate) or by making restrictive assumptions on the \sg noise covariance and the geometry of the loss landscape. Extensive experimental validations indicate that our proposal is competitive with the state-of-the-art on \sgmcmc, while being much more practical to use.

Related papers

Optimal sparse phase retrieval via a quasi-Bayesian approach [0.0]
A signal need to be reconstructed using only the magnitude of its transformation while phase information remains inaccessible. We introduce a novel sparse quasi-Bayesian approach and provide the first theoretical guarantees for such an approach. Our results establish that the proposed Bayesian estimator achieves minimax-optimal convergence rates under sub-exponential noise.
arXiv Detail & Related papers (2025-04-13T10:21:35Z)
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate. This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z)
Robust Approximate Sampling via Stochastic Gradient Barker Dynamics [0.0]
We introduce the Barker gradient dynamics (SGBD) algorithm, a robust alternative to Langevin-based sampling algorithms, to the gradient framework. We characterize the impact of gradients on the Barker transition mechanism and develop a bias-corrected version that eliminates the error due to the gradient noise in the proposal.
arXiv Detail & Related papers (2024-05-14T23:47:02Z)
First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods. Our approach covers both non-linear gradient and strongly Monte Carlo problems. We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z)
Conjugate Gradient Adaptive Learning with Tukey's Biweight M-Estimate [35.60818658948953]
We propose a novel M-estimate conjugate gradient (CG) algorithm, termed Tukey's biweight M-estimate CG (TbMCG) In particular, the TbMCG algorithm can achieve a faster convergence while retaining a reduced computational complexity. Simulation results confirm the excellent performance of the proposed TbMCG algorithm for system identification and active noise control applications.
arXiv Detail & Related papers (2022-03-19T01:02:43Z)
Study of Proximal Normalized Subband Adaptive Algorithm for Acoustic Echo Cancellation [23.889870461547105]
We propose a novel normalized subband adaptive filter algorithm suited for sparse scenarios. The proposed algorithm is derived based on the proximal forward-backward splitting and the soft-thresholding methods. We analyze the mean and mean square behaviors of the algorithm, which is supported by simulations.
arXiv Detail & Related papers (2021-08-14T22:20:09Z)
SNIPS: Solving Noisy Inverse Problems Stochastically [25.567566997688044]
We introduce a novel algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem. Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE) We show that the samples produced are sharp, detailed and consistent with the given measurements, and their diversity exposes the inherent uncertainty in the inverse problem being solved.
arXiv Detail & Related papers (2021-05-31T13:33:21Z)
Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z)
Asymptotic study of stochastic adaptive algorithm in non-convex landscape [2.1320960069210484]
This paper studies some assumption properties of adaptive algorithms widely used in optimization and machine learning. Among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms.
arXiv Detail & Related papers (2020-12-10T12:54:45Z)
Plug-And-Play Learned Gaussian-mixture Approximate Message Passing [71.74028918819046]
We propose a plug-and-play compressed sensing (CS) recovery algorithm suitable for any i.i.d. source prior. Our algorithm builds upon Borgerding's learned AMP (LAMP), yet significantly improves it by adopting a universal denoising function within the algorithm. Numerical evaluation shows that the L-GM-AMP algorithm achieves state-of-the-art performance without any knowledge of the source prior.
arXiv Detail & Related papers (2020-11-18T16:40:45Z)
Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence. We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically. In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z)
Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models. We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP) We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.