Related papers: Investigating Batch Inference in a Sequential Monte Carlo Framework for Neural Networks

Investigating Batch Inference in a Sequential Monte Carlo Framework for Neural Networks

URL: http://arxiv.org/abs/2601.21983v1
Date: Thu, 29 Jan 2026 16:59:31 GMT
Title: Investigating Batch Inference in a Sequential Monte Carlo Framework for Neural Networks
Authors: Andrew Millard, Joshua Murphy, Peter Green, Simon Maskell,
Abstract summary: Bayesian inference allows us to define a posterior distribution over the weights of a generic neural network (NN)<n>One such approximation - variational inference - is computationally efficient when using mini-batch gradient descent.<n>We find that we can achieve up to $6times$ faster training with minimal loss in accuracy on benchmark image classification problems using NNs.
Score: 1.8129328638036128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian inference allows us to define a posterior distribution over the weights of a generic neural network (NN). Exact posteriors are usually intractable, in which case approximations can be employed. One such approximation - variational inference - is computationally efficient when using mini-batch stochastic gradient descent as subsets of the data are used for likelihood and gradient evaluations, though the approach relies on the selection of a variational distribution which sufficiently matches the form of the posterior. Particle-based methods such as Markov chain Monte Carlo and Sequential Monte Carlo (SMC) do not assume a parametric family for the posterior by typically require higher computational cost. These sampling methods typically use the full-batch of data for likelihood and gradient evaluations, which contributes to this computational expense. We explore several methods of gradually introducing more mini-batches of data (data annealing) into likelihood and gradient evaluations of an SMC sampler. We find that we can achieve up to $6\times$ faster training with minimal loss in accuracy on benchmark image classification problems using NNs.

Related papers

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z)
Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods [0.38233569758620045]
Methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators.<n>The automatic posterior transformation (APT) method proposed by Greenberg et al. performs well and scales to high-level runtime data.<n>In this paper, we reformulate APT as a nested estimation problem.<n>We construct several multi- Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios.
arXiv Detail & Related papers (2024-01-30T06:29:41Z)
Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients [0.8749675983608172]
We present an unbiased method for posterior means based on kinetic Langevin dynamics. Our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the gold-standard" randomized Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2023-11-08T21:19:52Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs) PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z)
Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods [13.649384403827359]
Normalizing flows can generate complex target distributions and show promise in many applications in Bayesian statistics. Since no data set from the target posterior distribution is available beforehand, the flow is typically trained using the reverse Kullback-Leibler (KL) divergence that only requires samples from a base distribution. Here we explore a distinct training strategy, using the direct KL divergence as loss, in which samples from the posterior are generated by (i) assisting a local MCMC algorithm on the posterior with a normalizing flow to accelerate its mixing rate and (ii) using the data generated this way to train the flow.
arXiv Detail & Related papers (2021-07-16T16:40:36Z)
What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior. We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z)
Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction. We adaptively select the descent steps where the measure reduction is carried out. We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.