Related papers: Faster Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk

Faster Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk

URL: http://arxiv.org/abs/2206.09384v1
Date: Sun, 19 Jun 2022 11:33:07 GMT
Title: Faster Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk
Authors: Oren Mangoubi, Nisheeth K. Vishnoi
Abstract summary: We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto e-f(theta)$ constrained to a polytope $K$ defined by $m$ inequalities. Our main result is a "soft-warm'' variant of the Dikin walk Markov chain that requires at most $O((md + d L2 R2) times MDomega-1) log(fracwdelta)$ arithmetic operations to sample from $pi$
Score: 28.431572772564518
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of sampling from a $d$-dimensional log-concave distribution $\pi(\theta) \propto e^{-f(\theta)}$ constrained to a polytope $K$ defined by $m$ inequalities. Our main result is a "soft-threshold'' variant of the Dikin walk Markov chain that requires at most $O((md + d L^2 R^2) \times md^{\omega-1}) \log(\frac{w}{\delta}))$ arithmetic operations to sample from $\pi$ within error $\delta>0$ in the total variation distance from a $w$-warm start, where $L$ is the Lipschitz-constant of $f$, $K$ is contained in a ball of radius $R$ and contains a ball of smaller radius $r$, and $\omega$ is the matrix-multiplication constant. When a warm start is not available, it implies an improvement of $\tilde{O}(d^{3.5-\omega})$ arithmetic operations on the previous best bound for sampling from $\pi$ within total variation error $\delta$, which was obtained with the hit-and-run algorithm, in the setting where $K$ is a polytope given by $m=O(d)$ inequalities and $LR = O(\sqrt{d})$. When a warm start is available, our algorithm improves by a factor of $d^2$ arithmetic operations on the best previous bound in this setting, which was obtained for a different version of the Dikin walk algorithm. Plugging our Dikin walk Markov chain into the post-processing algorithm of Mangoubi and Vishnoi (2021), we achieve further improvements in the dependence of the running time for the problem of generating samples from $\pi$ with infinity distance bounds in the special case when $K$ is a polytope.

Related papers

Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent. We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z)
Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$. The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z)
Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space [10.292118864147097]
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$.
arXiv Detail & Related papers (2023-12-05T16:02:04Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits. For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP) We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z)
Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs) Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Sampling from Log-Concave Distributions with Infinity-Distance Guarantees and Applications to Differentially Private Optimization [33.38289436686841]
We present an algorithm that outputs a point from a distributionO(varepsilon)$close to $$ in infinity-distance. We also present a "soft-pi" version of the Dikin walk which may be independent interest.
arXiv Detail & Related papers (2021-11-07T13:44:50Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Learning Mixtures of Spherical Gaussians via Fourier Analysis [0.5381004207943596]
We find that a bound on the sample and computational complexity was previously unknown when $omega(1) leq d leq O(log k)$. These authors also show that the sample of complexity of a random mixture of gaussians in a ball of radius $d$ in $d$ dimensions, when $d$ is $Theta(sqrtd)$ in $d$ dimensions, when $d$ is at least $poly(k, frac1delta)$.
arXiv Detail & Related papers (2020-04-13T08:06:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.