Related papers: Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

URL: http://arxiv.org/abs/2505.11671v1
Date: Fri, 16 May 2025 20:10:04 GMT
Title: Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Authors: Andrew Millard, Zheng Zhao, Joshua Murphy, Simon Maskell,
Abstract summary: We introduce a scalable variant by incorporating Gradient Hamiltonian Monte Carlo proposals into Sequential Monte Carlo (SMC) methods.<n>Our resulting SMCSGHMC algorithm outperforms gradient descent ensembles across image classification, out-of-distribution detection, and transfer learning tasks.
Score: 3.2254941904559917
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) proposals into SMC, enabling efficient mini-batch based sampling. Our resulting SMCSGHMC algorithm outperforms standard stochastic gradient descent (SGD) and deep ensembles across image classification, out-of-distribution (OOD) detection, and transfer learning tasks. We further show that SMCSGHMC mitigates overfitting and improves calibration, providing a flexible, scalable pathway for converting pretrained neural networks into well-calibrated Bayesian models.

Related papers

Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks [3.2254941904559917]
Partial Bayesian neural networks (pBNNs) have been shown to perform competitively with fully Bayesian neural networks.<n>We introduce a new SMC-based training method for pBNNs by utilising a guided proposal and incorporating gradient-based Markov kernels.<n>We show that our new method outperforms the state-of-the-art in terms of predictive performance and optimal loss.
arXiv Detail & Related papers (2025-05-01T20:05:38Z)
Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization [21.206714676842317]
We propose a general adaptation framework for tuning the Markov kernels in SMC samplers.<n>We provide a gradient- and tuning-free algorithm that is generally applicable for kernels such as Langevin Monte Carlo (LMC)<n>Our implementations are able to obtain a full textitschedule of tuned parameters at the cost of a few vanilla SMC runs.
arXiv Detail & Related papers (2025-03-19T21:35:02Z)
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo [32.46884330460211]
We propose a simple yet effective approach to enhance sample diversity in Gradient Markov Chain Monte Carlo.<n>This approach produces a more diverse set of samples, allowing faster mixing within the same computational budget.<n>Our experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach.
arXiv Detail & Related papers (2025-03-02T02:42:50Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z)
Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z)
Plug-And-Play Learned Gaussian-mixture Approximate Message Passing [71.74028918819046]
We propose a plug-and-play compressed sensing (CS) recovery algorithm suitable for any i.i.d. source prior. Our algorithm builds upon Borgerding's learned AMP (LAMP), yet significantly improves it by adopting a universal denoising function within the algorithm. Numerical evaluation shows that the L-GM-AMP algorithm achieves state-of-the-art performance without any knowledge of the source prior.
arXiv Detail & Related papers (2020-11-18T16:40:45Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts [8.36840154574354]
We propose a class of adaptive gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is biased to enhance escape from saddle points and the bias is adaptively adjusted according to the gradient of past samples. We demonstrate via numerical examples that the proposed algorithms can significantly outperform the existing SGMCMC algorithms.
arXiv Detail & Related papers (2020-09-20T22:03:39Z)
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC [25.47669573608621]
We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
arXiv Detail & Related papers (2020-08-12T15:02:59Z)
Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via Non-uniform Subsampling of Gradients [54.90670513852325]
We propose a non-uniform subsampling scheme to improve the sampling accuracy. EWSG is designed so that a non-uniform gradient-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index.
arXiv Detail & Related papers (2020-02-20T18:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.