Sharp Concentration Results for Heavy-Tailed Distributions
- URL: http://arxiv.org/abs/2003.13819v3
- Date: Mon, 25 Jul 2022 19:44:27 GMT
- Title: Sharp Concentration Results for Heavy-Tailed Distributions
- Authors: Milad Bakhshizadeh, Arian Maleki, Victor H. de la Pena
- Abstract summary: We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions.
Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails.
- Score: 17.510560590853576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We obtain concentration and large deviation for the sums of independent and
identically distributed random variables with heavy-tailed distributions. Our
concentration results are concerned with random variables whose distributions
satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R}
\rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow
\alpha \in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not
only recover some of the existing results, such as the concentration of the sum
of subWeibull random variables, but it can also produce new results for the sum
of random variables with heavier tails. We show that the concentration
inequalities we obtain are sharp enough to offer large deviation results for
the sums of independent random variables as well. Our analyses which are based
on standard truncation arguments simplify, unify and generalize the existing
results on the concentration and large deviation of heavy-tailed random
variables.
Related papers
- Mean-Square Analysis of Discretized It\^o Diffusions for Heavy-tailed
Sampling [17.415391025051434]
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincar'e inequalities.
Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is $epsilon$ close to the target distribution in the Wasserstein-2 metric.
arXiv Detail & Related papers (2023-03-01T15:16:03Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - A Nearly Tight Bound for Fitting an Ellipsoid to Gaussian Random Points [50.90125395570797]
This nearly establishes a conjecture ofciteSaundersonCPW12, within logarithmic factors.
The latter conjecture has attracted significant attention over the past decade, due to its connections to machine learning and sum-of-squares lower bounds for certain statistical problems.
arXiv Detail & Related papers (2022-12-21T17:48:01Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Sharper Sub-Weibull Concentrations: Non-asymptotic Bai-Yin's Theorem [0.0]
Non-asymptotic concentration inequalities play an essential role in the finite-sample theory of machine learning and statistics.
We obtain a sharper and constants-specified concentration inequality for the summation of independent sub-Weibull random variables.
In the application of negative binomial regressions, we gives the $ell$-error with sparse structures, which is a new result for negative binomial regressions.
arXiv Detail & Related papers (2021-02-04T07:16:27Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$ [5.457150493905064]
We present a novel estimator with sub-Gaussian convergence.
Our estimator does not require prior knowledge of the variance.
Our estimator construction and analysis gives a framework generalizable to other problems.
arXiv Detail & Related papers (2020-11-17T02:47:24Z) - $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z) - Sharp finite-sample concentration of independent variables [5.788221302433176]
We show an extension of Sanov's theorem on large deviations.
We control the tail probabilities of i.i.d. random variables with matching concentration and anti-concentration bounds.
arXiv Detail & Related papers (2020-08-30T23:05:55Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.