Generalization Bounds in the Presence of Outliers: a Median-of-Means
Study
- URL: http://arxiv.org/abs/2006.05240v2
- Date: Sun, 7 Feb 2021 10:58:29 GMT
- Title: Generalization Bounds in the Presence of Outliers: a Median-of-Means
Study
- Authors: Pierre Laforgue, Guillaume Staerman, Stephan Cl\'emen\c{c}on
- Abstract summary: The Median-of-Means (MoM) is an estimator of the mean $theta$ of a square integrable r.v. $Z$.
Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning.
A new line of work is now trying to characterize and leverage MoM's ability to deal with corrupted data.
- Score: 8.905677748354364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator
of the mean $\theta$ of a square integrable r.v. $Z$, around which accurate
nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a
sub-Gaussian tail behavior. Thanks to the high confidence it achieves on
heavy-tailed data, MoM has found various applications in machine learning,
where it is used to design training procedures that are not sensitive to
atypical observations. More recently, a new line of work is now trying to
characterize and leverage MoM's ability to deal with corrupted data. In this
context, the present work proposes a general study of MoM's concentration
properties under the contamination regime, that provides a clear understanding
of the impact of the outlier proportion and the number of blocks chosen. The
analysis is extended to (multisample) $U$-statistics, i.e. averages over tuples
of observations, that raise additional challenges due to the dependence
induced. Finally, we show that the latter bounds can be used in a
straightforward fashion to derive generalization guarantees for pairwise
learning in a contaminated setting, and propose an algorithm to compute
provably reliable decision functions.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - On Medians of (Randomized) Pairwise Means [8.497456090408084]
Tournament procedures, recently introduced in Lugosi & Mendelson, offer an appealing alternative to the principle of Empirical Risk Minimization in machine learning.
This paper extends this approach to address other learning problems, in particular for which the performance criterion takes the form of an expectation over pairs of observations.
arXiv Detail & Related papers (2022-11-01T17:18:15Z) - Uniform Concentration Bounds toward a Unified Framework for Robust
Clustering [21.789311405437573]
Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm.
Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit.
This paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures.
arXiv Detail & Related papers (2021-10-27T03:43:44Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Improved Estimation of Concentration Under $\ell_p$-Norm Distance
Metrics Using Half Spaces [14.947511752748005]
Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability.
We propose a method to estimate the concentration of any empirical dataset under $ell_p$-norm distance metrics.
Our proposed algorithm is more efficient than Mahloujifar et al.'s, and our experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds.
arXiv Detail & Related papers (2021-03-24T01:16:28Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Robust Principal Component Analysis: A Median of Means Approach [17.446104539598895]
Principal Component Analysis is a tool for data visualization, denoising, and dimensionality reduction.
Recent supervised learning methods have shown great success in dealing with outlying observations.
This paper proposes a PCA procedure based on the MoM principle.
arXiv Detail & Related papers (2021-02-05T19:59:05Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.