Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm
- URL: http://arxiv.org/abs/2303.07287v2
- Date: Sat, 20 Jan 2024 03:20:20 GMT
- Title: Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm
- Authors: Huiming Zhang, Haoyu Wei, Guang Cheng
- Abstract summary: In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of paramount importance.
We suggest using the sub-Gaussian intrinsic moment norm achieved by maximizing a sequence of normalized moments.
- Score: 12.400847321321912
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
Related papers
- Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [6.144680854063938]
We consider a variant of the gradient descent (SGD) with a random learning rate to reveal its convergence properties.
We demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - Variational Laplace for Bayesian neural networks [25.055754094939527]
Variational Laplace exploits a local approximation of the likelihood to estimate the ELBO without the need for sampling the neural-network weights.
We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.
arXiv Detail & Related papers (2021-02-27T14:06:29Z) - Marginalised Gaussian Processes with Nested Sampling [10.495114898741203]
Gaussian Process (GPs) models are a rich distribution over functions with inductive biases controlled by a kernel function.
This work presents an alternative learning procedure where the hyperparameters of the kernel function are marginalised using Nested Sampling (NS)
arXiv Detail & Related papers (2020-10-30T16:04:35Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications [6.034622792544481]
We study when a sub-Gaussian matrix can become a near isometry on a set.
We show that previous best known dependence on the sub-Gaussian norm was sub-optimal.
We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables.
arXiv Detail & Related papers (2020-01-28T23:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.