Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting
- URL: http://arxiv.org/abs/2310.06081v2
- Date: Sat, 30 Mar 2024 16:21:39 GMT
- Title: Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting
- Authors: Aleksei Ustimenko, Aleksandr Beznosikov,
- Abstract summary: We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation.
We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
- Score: 64.0722630873758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one as in most related papers. Moreover, in our chain the drift and diffusion coefficient can be inexact in order to cover wide range of applications as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent or Stochastic Gradient Boosting. We prove the bound in $W_{2}$-distance between the laws of our Ito chain and corresponding differential equation. These results improve or cover most of the known estimates. And for some particular cases, our analysis is the first.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Tensor cumulants for statistical inference on invariant distributions [49.80012009682584]
We show that PCA becomes computationally hard at a critical value of the signal's magnitude.
We define a new set of objects, which provide an explicit, near-orthogonal basis for invariants of a given degree.
It also lets us analyze a new problem of distinguishing between different ensembles.
arXiv Detail & Related papers (2024-04-29T14:33:24Z) - Compressed and distributed least-squares regression: convergence rates
with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning.
We highlight differences in terms of convergence rates between several unbiased compression operators.
We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Convergence of Recursive Stochastic Algorithms using Wasserstein
Divergence [4.688616907736838]
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
arXiv Detail & Related papers (2020-03-25T13:45:16Z) - A Unified Convergence Analysis for Shuffling-Type Gradient Methods [32.8097849940763]
We propose a unified convergence analysis for a generic gradient shufflingtype methods for solving finitesum problems.
Our results suggest some choices appropriate for training rates in certain neural shuffling variants.
arXiv Detail & Related papers (2020-02-19T15:45:41Z) - Empirical and Instance-Dependent Estimation of Markov Chain and Mixing
Time [5.167069404528051]
We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations.
Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation.
This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains.
arXiv Detail & Related papers (2019-12-14T13:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.