Mean-field variational inference with the TAP free energy: Geometric and
statistical properties in linear models
- URL: http://arxiv.org/abs/2311.08442v1
- Date: Tue, 14 Nov 2023 17:35:01 GMT
- Title: Mean-field variational inference with the TAP free energy: Geometric and
statistical properties in linear models
- Authors: Michael Celentano, Zhou Fan, Licong Lin, Song Mei
- Abstract summary: We show that the landscape of the TAP free energy is strongly convex in an extensive neighborhood of a local minimizer.
We prove analogous properties for a local minimizer of the TAP free energy reachable by and show posterior inference based on this minimizer remains correctly.
- Score: 20.311583934266903
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study mean-field variational inference in a Bayesian linear model when the
sample size n is comparable to the dimension p. In high dimensions, the common
approach of minimizing a Kullback-Leibler divergence from the posterior
distribution, or maximizing an evidence lower bound, may deviate from the true
posterior mean and underestimate posterior uncertainty. We study instead
minimization of the TAP free energy, showing in a high-dimensional asymptotic
framework that it has a local minimizer which provides a consistent estimate of
the posterior marginals and may be used for correctly calibrated posterior
inference. Geometrically, we show that the landscape of the TAP free energy is
strongly convex in an extensive neighborhood of this local minimizer, which
under certain general conditions can be found by an Approximate Message Passing
(AMP) algorithm. We then exhibit an efficient algorithm that linearly converges
to the minimizer within this local neighborhood. In settings where it is
conjectured that no efficient algorithm can find this local neighborhood, we
prove analogous geometric properties for a local minimizer of the TAP free
energy reachable by AMP, and show that posterior inference based on this
minimizer remains correctly calibrated.
Related papers
- Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities [0.5592394503914488]
We introduce a method for identifying and approximating a target measure $pi$ as a perturbation of a given reference measure $mu$.
Our main contribution unveils a connection between the emphdimensional logarithmic Sobolev inequality (LSI) and approximations with this ansatz.
arXiv Detail & Related papers (2024-06-18T20:02:44Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - A Dimensionality Reduction Method for Finding Least Favorable Priors
with a Focus on Bregman Divergence [108.28566246421742]
This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension.
In order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.
arXiv Detail & Related papers (2022-02-23T16:22:28Z) - Neighborhood Region Smoothing Regularization for Finding Flat Minima In
Deep Neural Networks [16.4654807047138]
We propose an effective regularization technique, called Neighborhood Region Smoothing (NRS)
NRS tries to regularize the neighborhood region in weight space to yield approximate outputs.
We empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method.
arXiv Detail & Related papers (2022-01-16T15:11:00Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Non asymptotic estimation lower bounds for LTI state space models with
Cram\'er-Rao and van Trees [1.14219428942199]
We study the estimation problem for linear time-invariant (LTI) state-space models with Gaussian excitation of an unknown covariance.
We provide non lower bounds for the expected estimation error and the mean square estimation risk of the least square estimator.
Our results extend and improve existing lower bounds to lower bounds in expectation of the mean square estimation risk.
arXiv Detail & Related papers (2021-09-17T15:00:25Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Local convexity of the TAP free energy and AMP convergence for
Z2-synchronization [17.940267599218082]
We study mean-field variational Bayesian inference using the TAP approach for Z2-synchronization.
We show that for any signal strength $lambda > 1$, there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law.
arXiv Detail & Related papers (2021-06-21T22:08:17Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.