Can Microcanonical Langevin Dynamics Leverage Mini-Batch Gradient Noise?
- URL: http://arxiv.org/abs/2602.06500v1
- Date: Fri, 06 Feb 2026 08:52:19 GMT
- Title: Can Microcanonical Langevin Dynamics Leverage Mini-Batch Gradient Noise?
- Authors: Emanuel Sommer, Kangning Diao, Jakob Robnik, Uros Seljak, David RĂ¼gamer,
- Abstract summary: Scaling inference methods such as chain Monte Carlo to high-dimensional models remains a central challenge in deep learning.<n>A promising recent proposal, microcanonical Langevin Monte Carlo, has shown state-of-the-art performance across a wide range of problems.<n>This paper addresses a fundamental question: Can microcanonical dynamics effectively leverage mini-batch gradient noise?
- Score: 15.401115530780267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scaling inference methods such as Markov chain Monte Carlo to high-dimensional models remains a central challenge in Bayesian deep learning. A promising recent proposal, microcanonical Langevin Monte Carlo, has shown state-of-the-art performance across a wide range of problems. However, its reliance on full-dataset gradients makes it prohibitively expensive for large-scale problems. This paper addresses a fundamental question: Can microcanonical dynamics effectively leverage mini-batch gradient noise? We provide the first systematic study of this problem, establishing a novel continuous-time theoretical analysis of stochastic-gradient microcanonical dynamics. We reveal two critical failure modes: a theoretically derived bias due to anisotropic gradient noise and numerical instabilities in complex high-dimensional posteriors. To tackle these issues, we propose a principled gradient noise preconditioning scheme shown to significantly reduce this bias and develop a novel, energy-variance-based adaptive tuner that automates step size selection and dynamically informs numerical guardrails. The resulting algorithm is a robust and scalable microcanonical Monte Carlo sampler that achieves state-of-the-art performance on challenging high-dimensional inference tasks like Bayesian neural networks. Combined with recent ensemble techniques, our work unlocks a new class of stochastic microcanonical Langevin ensemble (SMILE) samplers for large-scale Bayesian inference.
Related papers
- Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z) - Estimating Multi-chirp Parameters using Curvature-guided Langevin Monte Carlo [7.832209959041259]
This paper considers the problem of parameters from a noisy mixture of chirps.<n>We propose a modified Langevin Monte Carlo (LMC) sampler that exploits the average curvature of the objective function.
arXiv Detail & Related papers (2025-01-30T07:15:38Z) - Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.<n> scaling them to large graphs is challenging due to the high computational and storage costs.<n>We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z) - Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte
Carlo [4.656426393230839]
The rise of artificial intelligence (AI) hinges on efficient of modern deep neural networks (DNNs) for non-trips and uncertainty.
In this thesis we propose a tool to handle the problem of Monte Carlo exploitation.
We also propose two dynamic importance sampling algorithms for the underlying ordinary equation (ODE) system.
arXiv Detail & Related papers (2023-05-30T18:25:11Z) - Unbiasing time-dependent Variational Monte Carlo by projected quantum
evolution [44.99833362998488]
We analyze the accuracy and sample complexity of variational Monte Carlo approaches to simulate quantum systems classically.
We prove that the most used scheme, the time-dependent Variational Monte Carlo (tVMC), is affected by a systematic statistical bias.
We show that a different scheme based on the solution of an optimization problem at each time step is free from such problems.
arXiv Detail & Related papers (2023-05-23T17:38:10Z) - High Accuracy Uncertainty-Aware Interatomic Force Modeling with
Equivariant Bayesian Neural Networks [3.028098724882708]
We introduce a new Monte Carlo Markov chain sampling algorithm for learning interatomic forces.
In addition, we introduce a new neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a good measure of uncertainty.
arXiv Detail & Related papers (2023-04-05T10:39:38Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Removing the mini-batching error in Bayesian inference using Adaptive
Langevin dynamics [0.0]
We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction.
We investigate the practical relevance of the assumptions underpinning Adaptive Langevin.
arXiv Detail & Related papers (2021-05-21T13:39:39Z) - Robust Learning under Strong Noise via SQs [5.9256596453465225]
We show that every SQ learnable class admits an efficient learning algorithm with OPT + $epsilon misilon misclassification error for a broad class of noise models.
This setting substantially generalizes the widely-studied problem classification under RCN with known noise probabilities.
arXiv Detail & Related papers (2020-10-18T21:02:26Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.