Entropy-MCMC: Sampling from Flat Basins with Ease
- URL: http://arxiv.org/abs/2310.05401v5
- Date: Mon, 25 Mar 2024 18:07:22 GMT
- Title: Entropy-MCMC: Sampling from Flat Basins with Ease
- Authors: Bolian Li, Ruqi Zhang,
- Abstract summary: We introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins.
By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead.
Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks.
- Score: 10.764160559530849
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, targeting at the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.
Related papers
- Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements [45.70011319850862]
Diffusion models have emerged as a powerful foundation model for visual generation.
Current posterior sampling based methods take the measurement into the posterior sampling to infer the distribution of the target data.
We show that high-frequency information can be prematurely introduced during the early stages, which could induce larger posterior estimate errors.
We propose a novel diffusion posterior sampling method DPS-CM, which incorporates a Crafted Measurement.
arXiv Detail & Related papers (2024-11-15T00:06:57Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - A New Robust Multivariate Mode Estimator for Eye-tracking Calibration [0.0]
We propose a new method for estimating the main mode of multivariate distributions, with application to eye-tracking calibrations.
In this type of multimodal distributions, most central tendency measures fail at estimating the principal fixation coordinates.
Here, we developed a new algorithm to identify the first mode of multivariate distributions, named BRIL.
We obtained outstanding performances, even for distributions containing very high proportions of outliers, both grouped in clusters and randomly distributed.
arXiv Detail & Related papers (2021-07-16T17:45:19Z) - Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov
Chain Monte Carlo Methods [13.649384403827359]
Normalizing flows can generate complex target distributions and show promise in many applications in Bayesian statistics.
Since no data set from the target posterior distribution is available beforehand, the flow is typically trained using the reverse Kullback-Leibler (KL) divergence that only requires samples from a base distribution.
Here we explore a distinct training strategy, using the direct KL divergence as loss, in which samples from the posterior are generated by (i) assisting a local MCMC algorithm on the posterior with a normalizing flow to accelerate its mixing rate and (ii) using the data generated this way to train the flow.
arXiv Detail & Related papers (2021-07-16T16:40:36Z) - Leverage Score Sampling for Complete Mode Coverage in Generative
Adversarial Networks [11.595070613477548]
A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution.
We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods.
arXiv Detail & Related papers (2021-04-06T09:00:38Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of
Multimodal Posteriors [8.11978827493967]
We propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes as possible.
We present theoretical consistency with an example where the stacked inference process approximates the true data.
We demonstrate practical implementation in several model families.
arXiv Detail & Related papers (2020-06-22T15:26:59Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.