Related papers: A variational approach to dimension-free self-normalized concentration

A variational approach to dimension-free self-normalized concentration

URL: http://arxiv.org/abs/2508.06483v1
Date: Fri, 08 Aug 2025 17:44:09 GMT
Title: A variational approach to dimension-free self-normalized concentration
Authors: Ben Chugg, Aaditya Ramdas,
Abstract summary: We focus on bounds for sub-$psi$ processes, a tail condition that encompasses a wide variety of well-known distributions.<n>Our results fill a gap in the literature between determinant-based bounds and those based on condition numbers.
Score: 31.20747148247217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the self-normalized concentration of vector-valued stochastic processes. We focus on bounds for sub-$\psi$ processes, a tail condition that encompasses a wide variety of well-known distributions (including sub-exponential, sub-Gaussian, sub-gamma, and sub-Poisson distributions). Our results recover and generalize the influential bound of Abbasi-Yadkori et al. (2011) and fill a gap in the literature between determinant-based bounds and those based on condition numbers. As applications we prove a Bernstein inequality for random vectors satisfying a moment condition (which is more general than boundedness), and also provide the first dimension-free, self-normalized empirical Bernstein inequality. Our techniques are based on the variational (PAC-Bayes) approach to concentration.

Related papers

Convergence Rates for Distribution Matching with Sliced Optimal Transport [10.117027375572627]
We study the slice-matching scheme, an efficient iterative method for distribution matching based on sliced optimal transport.<n>A key challenge is to control along the trajectory the constants in these inequalities.<n>We show that this becomes tractable for Gaussian distributions.
arXiv Detail & Related papers (2026-02-11T09:47:52Z)
Universality of General Spiked Tensor Models [9.454986540713655]
We study the rank-one spiked tensor model in the high-dimensional regime.<n>We show that their high-dimensional spectral behavior and statistical limits are robust to non-Gaussian noise.
arXiv Detail & Related papers (2026-02-04T11:59:30Z)
Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation [6.800624963330628]
We derive non-asymptotic error bounds for nonlinear approximation algorithms in the Wasserstein-$p$ distance.<n>We show that the normalized last iterates converge to a Gaussian distribution in the $p$-Wasserstein distance at a rate of order $_n1/6$, where $_n$ is the step size.<n>These distributional guarantees imply high-probability concentration inequalities that improve upon those derived from moment bounds and Markovs inequality.
arXiv Detail & Related papers (2026-02-02T18:41:06Z)
Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity [35.13282725119597]
We provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds)<n>We illustrate the relevance of our results in the context of online linear regression, with applications in ( Kernelized) linear bandits.
arXiv Detail & Related papers (2025-11-05T16:27:02Z)
Limit Theorems for Stochastic Gradient Descent with Infinite Variance [51.4853131023238]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.<n>We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z)
Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation. We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z)
Unified framework for continuity of sandwiched Rényi divergences [0.0]
We prove continuity bounds for entropic quantities related to sandwiched R'enyi divergences.<n>In a separate contribution, we use the ALAFF method to study the stability of approximate quantum Markov chains.
arXiv Detail & Related papers (2023-08-23T21:09:54Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Convergence and concentration properties of constant step-size SGD through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD) We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z)
Stochastic Saddle Point Problems with Decision-Dependent Distributions [0.6091702876917279]
This paper focuses on saddle point problems with decision-dependent in both the static and time-varying settings. We introduce the notion of equilibrium points -- which are saddle points for the stationary minimax problem. We show that primal-dual algorithms converge to saddle points in a similar fashion.
arXiv Detail & Related papers (2022-01-07T03:36:41Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Generic Error Bounds for the Generalized Lasso with Sub-Exponential Data [4.56877715768796]
This work performs a non-asymptotic analysis of the generalized Lasso under the assumption of sub-exponential data. We show that the estimation error can be controlled by means of two complexity parameters that arise naturally from a generic-chaining-based proof strategy.
arXiv Detail & Related papers (2020-04-11T10:39:48Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.