Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet
Log-Sobolev
- URL: http://arxiv.org/abs/2010.05263v2
- Date: Sun, 6 Dec 2020 18:06:06 GMT
- Title: Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet
Log-Sobolev
- Authors: Xiao Wang, Qi Lei and Ioannis Panageas
- Abstract summary: One approach to sample from a high dimensional distribution matrix for some function is the Langevin Algorithm.
Our work generalizes the results of [53] where $mathRn$ is defined on af$ rather than $bbRn$.
- Score: 31.57723436316983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling is a fundamental and arguably very important task with numerous
applications in Machine Learning. One approach to sample from a high
dimensional distribution $e^{-f}$ for some function $f$ is the Langevin
Algorithm (LA). Recently, there has been a lot of progress in showing fast
convergence of LA even in cases where $f$ is non-convex, notably [53], [39] in
which the former paper focuses on functions $f$ defined in $\mathbb{R}^n$ and
the latter paper focuses on functions with symmetries (like matrix completion
type objectives) with manifold structure. Our work generalizes the results of
[53] where $f$ is defined on a manifold $M$ rather than $\mathbb{R}^n$. From
technical point of view, we show that KL decreases in a geometric rate whenever
the distribution $e^{-f}$ satisfies a log-Sobolev inequality on $M$.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded
Geometric Penalties [21.141544548229774]
We study the form $min_x max_y f(x, y) where $mathcalN$ are Hadamard.
We show global interest accelerated by reducing gradient convergence constants.
arXiv Detail & Related papers (2023-05-25T15:43:07Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Strong uniform convergence of Laplacians of random geometric and
directed kNN graphs on compact manifolds [0.0]
We study the almost sure uniform convergence of this operator to the diffusive Laplace-Beltrami operator when $n$ tends to infinity.
This work extends known results of the past 15 years.
arXiv Detail & Related papers (2022-12-20T14:31:06Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Algorithms and Hardness for Linear Algebra on Geometric Graphs [14.822517769254352]
We show that the exponential dependence on the dimension dimension $d in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved.
This is the first formal limitation proven about fast multipole methods.
arXiv Detail & Related papers (2020-11-04T18:35:02Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.