Related papers: Optimal Preconditioning and Fisher Adaptive Langevin Sampling

Optimal Preconditioning and Fisher Adaptive Langevin Sampling

URL: http://arxiv.org/abs/2305.14442v3
Date: Sat, 28 Oct 2023 13:08:19 GMT
Title: Optimal Preconditioning and Fisher Adaptive Langevin Sampling
Authors: Michalis K. Titsias
Abstract summary: We derive a computationally efficient adaptive MCMC scheme that learns the preconditioning from the history of gradients produced as the algorithm runs. We show in several experiments that the proposed algorithm is very robust in high dimensions and significantly outperforms other methods.
Score: 8.122270502556374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We define an optimal preconditioning for the Langevin diffusion by analytically optimizing the expected squared jumped distance. This yields as the optimal preconditioning an inverse Fisher information covariance matrix, where the covariance matrix is computed as the outer product of log target gradients averaged under the target. We apply this result to the Metropolis adjusted Langevin algorithm (MALA) and derive a computationally efficient adaptive MCMC scheme that learns the preconditioning from the history of gradients produced as the algorithm runs. We show in several experiments that the proposed algorithm is very robust in high dimensions and significantly outperforms other methods, including a closely related adaptive MALA scheme that learns the preconditioning with standard adaptive MCMC as well as the position-dependent Riemannian manifold MALA sampler.

Related papers

PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning [5.052293146674794]
Averaging techniques such as Ruppert--Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of gradient descent (SGD) optimization methods such as the popular ADAM.<n>In this work we propose an averaging approach, which we refer to as parallel averaged ADAM (PADAM) in which we compute parallely different averaged variants of ADAM and during the training process dynamically select the gradients with the smallest optimization error.
arXiv Detail & Related papers (2025-05-28T08:07:34Z)
Adaptive Bayesian Optimization for Robust Identification of Stochastic Dynamical Systems [4.0148499400442095]
This paper deals with the identification of linear derivation systems, where the unknowns include system coefficients and noise variances.<n>A sample-efficient global optimization method based on Bayesian optimization is proposed.<n>Experiments show that the EGP-based BO consistently outperforms MLE via steady-state filtering and expectation-maximization.
arXiv Detail & Related papers (2025-03-09T01:38:21Z)
Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes [8.198195852439946]
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. We develop a new method that achieves both high computational efficiency and low sample complexity, and then demonstrate its effectiveness through extensive numerical experiments.
arXiv Detail & Related papers (2023-02-03T03:02:38Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization [15.731908248435348]
We develop gradient descent averaging and primal-dual averaging algorithms for strongly convex cases. We prove that primal-dual averaging yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.
arXiv Detail & Related papers (2020-12-29T01:40:30Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior. We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z)
Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions. We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs [4.112293524466434]
The gradient of the expected information gain with respect to experimental design parameters is given by a nested expectation. We introduce an unbiased Monte Carlo estimator for the gradient of the expected information gain with finite expected squared $ell$-norm and finite expected computational cost per sample.
arXiv Detail & Related papers (2020-05-18T01:02:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.