Fast Convergence for Langevin Diffusion with Manifold Structure
- URL: http://arxiv.org/abs/2002.05576v2
- Date: Mon, 21 Sep 2020 17:48:52 GMT
- Title: Fast Convergence for Langevin Diffusion with Manifold Structure
- Authors: Ankur Moitra, Andrej Risteski
- Abstract summary: We deal with the problem of sampling from distributions of the form p(x) propto e-beta fx) for some function f whose values and we can query.
We argue that our work is an important first step towards understanding how when there is a high degree of completion in the space of parameters the produce the same output.
- Score: 32.494158429289584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of sampling from distributions of the
form p(x) \propto e^{-\beta f(x)} for some function f whose values and
gradients we can query. This mode of access to f is natural in the scenarios in
which such problems arise, for instance sampling from posteriors in parametric
Bayesian models. Classical results show that a natural random walk, Langevin
diffusion, mixes rapidly when f is convex. Unfortunately, even in simple
examples, the applications listed above will entail working with functions f
that are nonconvex -- for which sampling from p may in general require an
exponential number of queries.
In this paper, we focus on an aspect of nonconvexity relevant for modern
machine learning applications: existence of invariances (symmetries) in the
function f, as a result of which the distribution p will have manifolds of
points with equal probability. First, we give a recipe for proving mixing time
bounds for Langevin diffusion as a function of the geometry of these manifolds.
Second, we specialize our arguments to classic matrix factorization-like
Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d
\times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d
\times k}, and \beta the inverse of the variance of the noise. Such functions f
are invariant under orthogonal transformations, and include problems like
matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics
is a popular toy model for studying stochastic gradient descent. Along these
lines, we believe that our work is an important first step towards
understanding how SGD behaves when there is a high degree of symmetry in the
space of parameters the produce the same output.
Related papers
- von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Neural Diffeomorphic Non-uniform B-spline Flows [19.123498909919647]
We propose diffeomorphic non-uniform B-spline flows that are at least twice continuously differentiable while bi-Lipschitz continuous.
Our C2-diffeomorphic non-uniform B-spline flows yielded solutions better than previous spline flows and faster than smooth normalizing flows.
arXiv Detail & Related papers (2023-04-07T05:34:18Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Large deviations rates for stochastic gradient descent with strongly
convex functions [11.247580943940916]
We provide a formal framework for the study of general high probability bounds with gradient descent.
We find an upper large deviations bound for SGD with strongly convex functions.
arXiv Detail & Related papers (2022-11-02T09:15:26Z) - Stochastic Langevin Differential Inclusions with Applications to Machine Learning [5.274477003588407]
We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
arXiv Detail & Related papers (2022-06-23T08:29:17Z) - Adjoint-aided inference of Gaussian process driven differential
equations [0.8257490175399691]
We show how the adjoint of a linear system can be used to efficiently infer forcing functions modelled as GPs.
We demonstrate the approach on systems of both ordinary and partial differential equations.
arXiv Detail & Related papers (2022-02-09T17:35:14Z) - Feature Cross Search via Submodular Optimization [58.15569071608769]
We study feature cross search as a fundamental primitive in feature engineering.
We show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem.
arXiv Detail & Related papers (2021-07-05T16:58:31Z) - Symplectic Gaussian Process Regression of Hamiltonian Flow Maps [0.8029049649310213]
We present an approach to construct appropriate and efficient emulators for Hamiltonian flow maps.
Intended future applications are long-term tracing of fast charged particles in accelerators and magnetic plasma confinement.
arXiv Detail & Related papers (2020-09-11T17:56:35Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.