Boosting Variational Inference With Locally Adaptive Step-Sizes
- URL: http://arxiv.org/abs/2105.09240v1
- Date: Wed, 19 May 2021 16:41:33 GMT
- Title: Boosting Variational Inference With Locally Adaptive Step-Sizes
- Authors: Gideon Dresdner, Saurav Shekhar, Fabian Pedregosa, Francesco
Locatello, Gunnar R\"atsch
- Abstract summary: Boosting Variational Inference allows practitioners to obtain increasingly good posterior approximations by spending more compute.
The main obstacle to widespread adoption of Boosting Variational Inference is the amount of resources necessary to improve over a strong Variational Inference baseline.
We describe how the global curvature impacts time and memory consumption, address the problem with the notion of local curvature, and provide a novel approximate backtracking algorithm for estimating local curvature.
- Score: 27.122745595473383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational Inference makes a trade-off between the capacity of the
variational family and the tractability of finding an approximate posterior
distribution. Instead, Boosting Variational Inference allows practitioners to
obtain increasingly good posterior approximations by spending more compute. The
main obstacle to widespread adoption of Boosting Variational Inference is the
amount of resources necessary to improve over a strong Variational Inference
baseline. In our work, we trace this limitation back to the global curvature of
the KL-divergence. We characterize how the global curvature impacts time and
memory consumption, address the problem with the notion of local curvature, and
provide a novel approximate backtracking algorithm for estimating local
curvature. We give new theoretical convergence rates for our algorithms and
provide experimental validation on synthetic and real-world datasets.
Related papers
- Inferring Change Points in High-Dimensional Regression via Approximate Message Passing [9.660892239615366]
We propose an Approximate Message Passing (AMP) algorithm for estimating both the signals and the change point locations.
We rigorously characterize its performance in the high-dimensional limit where the number of parameters $p$ is proportional to the number of samples $n$.
We show how our AMP iterates can be used to efficiently compute a Bayesian posterior distribution over the change point locations in the high-dimensional limit.
arXiv Detail & Related papers (2024-04-11T15:57:12Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference [9.940560505044122]
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference.
We estimate the marginal likelihood based on approximate representations of the joint model.
arXiv Detail & Related papers (2023-10-06T17:41:41Z) - Federated Learning as Variational Inference: A Scalable Expectation
Propagation Approach [66.9033666087719]
This paper extends the inference view and describes a variational inference formulation of federated learning.
We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.
arXiv Detail & Related papers (2023-02-08T17:58:11Z) - A Variational Bayesian Approach to Learning Latent Variables for
Acoustic Knowledge Transfer [55.20627066525205]
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models.
Our proposed VB approach can obtain good improvements on target devices, and consistently outperforms 13 state-of-the-art knowledge transfer algorithms.
arXiv Detail & Related papers (2021-10-16T15:54:01Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Improving Bayesian Inference in Deep Neural Networks with Variational
Structured Dropout [19.16094166903702]
We introduce a new variational structured approximation inspired by the interpretation of Dropout training as approximate inference in Bayesian networks.
We then propose a novel method called Variational Structured Dropout (VSD) to overcome this limitation.
We conduct experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art methods on both predictive accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-02-16T02:33:43Z) - A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise.
We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.