Exponential Tail Local Rademacher Complexity Risk Bounds Without the
Bernstein Condition
- URL: http://arxiv.org/abs/2202.11461v1
- Date: Wed, 23 Feb 2022 12:27:53 GMT
- Title: Exponential Tail Local Rademacher Complexity Risk Bounds Without the
Bernstein Condition
- Authors: Varun Kanade, Patrick Rebeschini, Tomas Vaskevicius
- Abstract summary: The local Rademacher toolbox is one of the most successful general-purpose toolboxes.
Applying the Bernstein theory to problems where optimal performance is only achievable via non-probable settings yields an exponential-tail excess risk.
Our results apply to improper prediction regimes not covered by the toolbox.
- Score: 30.401770841788718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The local Rademacher complexity framework is one of the most successful
general-purpose toolboxes for establishing sharp excess risk bounds for
statistical estimators based on the framework of empirical risk minimization.
Applying this toolbox typically requires using the Bernstein condition, which
often restricts applicability to convex and proper settings. Recent years have
witnessed several examples of problems where optimal statistical performance is
only achievable via non-convex and improper estimators originating from
aggregation theory, including the fundamental problem of model selection. These
examples are currently outside of the reach of the classical localization
theory.
In this work, we build upon the recent approach to localization via offset
Rademacher complexities, for which a general high-probability theory has yet to
be established. Our main result is an exponential-tail excess risk bound
expressed in terms of the offset Rademacher complexity that yields results at
least as sharp as those obtainable via the classical theory. However, our bound
applies under an estimator-dependent geometric condition (the "offset
condition") instead of the estimator-independent (but, in general,
distribution-dependent) Bernstein condition on which the classical theory
relies. Our results apply to improper prediction regimes not directly covered
by the classical theory.
Related papers
- Local Risk Bounds for Statistical Aggregation [5.940699390639279]
We prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator.
These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu'e and Rigollet for random design regression.
arXiv Detail & Related papers (2023-06-29T17:51:42Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Statistical optimality conditions for compressive ensembles [7.766921168069532]
We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data.
We introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility.
We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme.
arXiv Detail & Related papers (2021-06-02T11:52:31Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - When is invariance useful in an Out-of-Distribution Generalization
problem ? [19.696505968699206]
The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments.
Popular approaches in this field use the hypothesis that such a predictor shall be an textitinvariant predictor that captures the mechanism that remains constant across environments.
This paper presents a new set of theoretical conditions necessary for an invariant predictor to achieve the OOD optimality.
arXiv Detail & Related papers (2020-08-04T23:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.