Non-asymptotic analysis and inference for an outlyingness induced
winsorized mean
- URL: http://arxiv.org/abs/2105.02337v1
- Date: Wed, 5 May 2021 21:35:24 GMT
- Title: Non-asymptotic analysis and inference for an outlyingness induced
winsorized mean
- Authors: Yijun Zuo
- Abstract summary: This article investigates the robustness of leading sub-Gaussian estimators of mean.
It reveals that none of them can resist greater than $25%$ contamination in data.
It also introduces an outlyingness induced winsorized mean which has the best possible robustness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust estimation of a mean vector, a topic regarded as obsolete in the
traditional robust statistics community, has recently surged in machine
learning literature in the last decade. The latest focus is on the sub-Gaussian
performance and computability of the estimators in a non-asymptotic setting.
Numerous traditional robust estimators are computationally intractable, which
partly contributes to the renewal of the interest in the robust mean
estimation.
Robust centrality estimators, however, include the trimmed mean and the
sample median. The latter has the best robustness but suffers a low-efficiency
drawback. Trimmed mean and median of means, %as robust alternatives to the
sample mean, and achieving sub-Gaussian performance have been proposed and
studied in the literature.
This article investigates the robustness of leading sub-Gaussian estimators
of mean and reveals that none of them can resist greater than $25\%$
contamination in data and consequently introduces an outlyingness induced
winsorized mean which has the best possible robustness (can resist up to $50\%$
contamination without breakdown) meanwhile achieving high efficiency.
Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a
bounded estimation error for contaminated samples at a given confidence level
in a finite sample setting. It can be computed in linear time.
Related papers
- Heavy-tailed Contamination is Easier than Adversarial Contamination [8.607294463464523]
A body of work in the statistics and computer science communities dating back to Huber (Huber, 1960) has led to statistically and computationally efficient outlier-robust estimators.
Two particular outlier models have received significant attention: the adversarial and heavy-tailed models.
arXiv Detail & Related papers (2024-11-22T19:00:33Z) - A Tale of Sampling and Estimation in Discounted Reinforcement Learning [50.43256303670011]
We present a minimax lower bound on the discounted mean estimation problem.
We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
arXiv Detail & Related papers (2023-04-11T09:13:17Z) - Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions [42.6763105645717]
Given a small number of corrupted samples, the goal is to efficiently compute a hypothesis that accurately approximates $mu$ with high probability.
Our algorithm achieves the optimal error using a number of samples scaling logarithmically with the ambient dimension.
Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite satisfying certain sparsity properties.
arXiv Detail & Related papers (2022-11-29T16:13:50Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms
and Theoretical Studies [1.6457778420360534]
A framework of outlier-resistant estimation is introduced to robustify arbitrarily loss function.
A new technique is proposed to alleviate the requirement on starting point such that on regular datasets the number of data reestimations can be substantially reduced.
The obtained estimators, though not necessarily globally or even globally, enjoymax optimality in both low dimensions.
arXiv Detail & Related papers (2021-12-15T20:35:21Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Outlier Robust Mean Estimation with Subgaussian Rates via Stability [46.03021473600576]
We study the problem of robust outlier high-dimensional mean estimation.
We obtain first computationally efficient rate with subgaussian for outlier-robust mean estimation.
arXiv Detail & Related papers (2020-07-30T17:33:03Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.