Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $φ$- and $β$-Mixing
- URL: http://arxiv.org/abs/2502.03551v2
- Date: Tue, 01 Jul 2025 13:41:50 GMT
- Title: Gradient Descent Algorithm in Hilbert Spaces under Stationary Markov Chains with $φ$- and $β$-Mixing
- Authors: Priyanka Roy, Susanne Saminger-Platz,
- Abstract summary: In this paper, we study a strictly stationary Markov chain descent algorithm operating in general Hilbert spaces.<n>Our analysis focuses on the mixing coefficients of the underlying process, specifically the $phi$- and $$beta-mixing coefficients.
- Score: 1.1510009152620668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study a strictly stationary Markov chain gradient descent algorithm operating in general Hilbert spaces. Our analysis focuses on the mixing coefficients of the underlying process, specifically the $\phi$- and $\beta$-mixing coefficients. Under these assumptions, we derive probabilistic upper bounds on the convergence behavior of the algorithm based on the exponential as well as the polynomial decay of the mixing coefficients.
Related papers
- Online Regularized Learning Algorithms in RKHS with $β$- and $φ$-Mixing Sequences [1.1510009152620668]
We study an online regularized learning algorithm in a reproducing Hilbert kernel spaces (RKHS)<n>We analyze a strictly stationary Markov chain, where the dependence structure is characterized by the (phi)- and (beta)-mixing coefficients.
arXiv Detail & Related papers (2025-07-08T12:25:04Z) - Statistical inference for Linear Stochastic Approximation with Markovian Noise [16.136756322711545]
We derive non-asymptotic Berry-Esseen bounds for averaged iterates of the Linear Approximation (LSA) algorithm driven by the Markovian noise.<n>Our work provides the first non-asymptotic guarantees on the rate of convergence of bootstrap-based confidence intervals for approximation with Markov noise.
arXiv Detail & Related papers (2025-05-25T11:43:28Z) - Benign Overfitting and the Geometry of the Ridge Regression Solution in Binary Classification [75.01389991485098]
We show that ridge regression has qualitatively different behavior depending on the scale of the cluster mean vector.
In regimes where the scale is very large, the conditions that allow for benign overfitting turn out to be the same as those for the regression task.
arXiv Detail & Related papers (2025-03-11T01:45:42Z) - Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Covariance estimation using Markov chain Monte Carlo [2.209921757303168]
We show that when $pi$ satisfies a Poincar'e inequality and the chain possesses a spectral gap, we can achieve similar sample complexity using MCMC.
We provide guarantees regarding isotropic rounding procedures for sampling uniformly on convex bodies.
arXiv Detail & Related papers (2024-10-22T16:27:29Z) - Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler [14.34147140416535]
We study the mixing time of two popular discrete-time Markov chains in continuous space.<n>We show that any $Phi$-divergence arising from a twice-differentiable strictly convex function $Phi$ converges to $0$ exponentially fast along these Markov chains.
arXiv Detail & Related papers (2024-10-14T16:41:45Z) - Upper Bounds for Learning in Reproducing Kernel Hilbert Spaces for Non IID Samples [1.1510009152620668]
We study a Markov chain-based gradient algorithm in general Hilbert spaces, aiming to approximate the optimal solution of a quadratic loss function.<n>We extend these results to an online regularized learning algorithm in reproducing kernel Hilbert spaces.
arXiv Detail & Related papers (2024-10-10T20:34:22Z) - Logistic-beta processes for dependent random probabilities with beta marginals [58.91121576998588]
We propose a novel process called the logistic-beta process, whose logistic transformation yields a process with common beta marginals.
It can model dependence on both discrete and continuous domains, such as space or time, and has a flexible dependence structure through correlation kernels.
We illustrate the benefits through nonparametric binary regression and conditional density estimation examples, both in simulation studies and in a pregnancy outcome application.
arXiv Detail & Related papers (2024-02-10T21:41:32Z) - Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation.
We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Non-asymptotic convergence bounds for Sinkhorn iterates and their
gradients: a coupling approach [10.568851068989972]
We focus on a relaxation of the original OT problem, the entropic OT problem, which allows to implement efficient and practical algorithmic solutions.
This formulation, also known as the Schr"odinger Bridge problem, notably connects with Optimal Control (SOC) and can be solved with the popular Sinkhorn algorithm.
arXiv Detail & Related papers (2023-04-13T13:58:25Z) - Rosenthal-type inequalities for linear statistics of Markov chains [20.606986885851573]
We establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains.
We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain.
arXiv Detail & Related papers (2023-03-10T10:24:46Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Deterministic Gibbs Sampling via Ordinary Differential Equations [77.42706423573573]
This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry.
We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory.
arXiv Detail & Related papers (2021-06-18T15:36:09Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions.
We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions.
We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Kernel Autocovariance Operators of Stationary Processes: Estimation and
Convergence [0.5505634045241288]
We consider autocovariance operators of a stationary process on a Polish space embedded into a kernel reproducing Hilbert space.
We investigate how empirical estimates of these operators converge along realizations of the process under various conditions.
We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities.
arXiv Detail & Related papers (2020-04-02T09:17:32Z) - Convergence of Recursive Stochastic Algorithms using Wasserstein
Divergence [4.688616907736838]
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
We show that convergence of a large family of constant stepsize RSAs can be understood using this framework.
arXiv Detail & Related papers (2020-03-25T13:45:16Z) - Distributed Learning with Dependent Samples [17.075804626858748]
We derive optimal learning rate for distributed kernel ridge regression for strong mixing sequences.
Our results extend the applicable range of distributed learning from i.i.d. samples to non-i.i.d. sequences.
arXiv Detail & Related papers (2020-02-10T14:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.