Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space
- URL: http://arxiv.org/abs/2312.02849v2
- Date: Sun, 9 Jun 2024 01:19:10 GMT
- Title: Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space
- Authors: Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian,
- Abstract summary: We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods.
Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$.
- Score: 10.292118864147097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}^d$ by a product measure $\pi^\star$. When $\pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $\pi^\star$ is close to the minimizer $\pi^\star_\diamond$ of the KL divergence over a \emph{polyhedral} set $\mathcal{P}_\diamond$, and (2) an algorithm for minimizing $\text{KL}(\cdot\|\pi)$ over $\mathcal{P}_\diamond$ with accelerated complexity $O(\sqrt \kappa \log(\kappa d/\varepsilon^2))$, where $\kappa$ is the condition number of $\pi$.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity [22.615156512223763]
We propose variance- optimistic sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective.
We prove $mathcal O(delta D2/varepsilon)$, communication complexity of $mathcal O(n+sqrtndelta D2/varepsilon)$, and local calls of $tildemathcal O(n+sqrtndelta+L)D2/varepsilon)$.
arXiv Detail & Related papers (2024-05-25T08:34:49Z) - On the Complexity of Finite-Sum Smooth Optimization under the
Polyak-{\L}ojasiewicz Condition [14.781921087738967]
This paper considers the optimization problem of the form $min_bf xinmathbb Rd f(bf x)triangleq frac1nsum_i=1n f_i(bf x)$, where $f(cdot)$ satisfies the Polyak--Lojasiewicz (PL) condition with parameter $mu$ and $f_i(cdot)_i=1n$ is $L$-mean-squared smooth.
arXiv Detail & Related papers (2024-02-04T17:14:53Z) - Faster Sampling from Log-Concave Distributions over Polytopes via a
Soft-Threshold Dikin Walk [28.431572772564518]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto e-f(theta)$ constrained to a polytope $K$ defined by $m$ inequalities.
Our main result is a "soft-warm'' variant of the Dikin walk Markov chain that requires at most $O((md + d L2 R2) times MDomega-1) log(fracwdelta)$ arithmetic operations to sample from $pi$
arXiv Detail & Related papers (2022-06-19T11:33:07Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Sampling from Log-Concave Distributions with Infinity-Distance
Guarantees and Applications to Differentially Private Optimization [33.38289436686841]
We present an algorithm that outputs a point from a distributionO(varepsilon)$close to $$ in infinity-distance.
We also present a "soft-pi" version of the Dikin walk which may be independent interest.
arXiv Detail & Related papers (2021-11-07T13:44:50Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Infinite-Horizon Offline Reinforcement Learning with Linear Function
Approximation: Curse of Dimensionality and Algorithm [46.36534144138337]
In this paper, we investigate the sample complexity of policy evaluation in offline reinforcement learning.
Under the low distribution shift assumption, we show that there is an algorithm that needs at most $Oleft(maxleft fracleftVert thetapirightVert _24varepsilon4logfracddelta,frac1varepsilon2left(d+logfrac1deltaright)right right)$ samples to approximate the
arXiv Detail & Related papers (2021-03-17T18:18:57Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.