$\mathscr{H}$-Consistency Estimation Error of Surrogate Loss Minimizers
- URL: http://arxiv.org/abs/2205.08017v1
- Date: Mon, 16 May 2022 23:13:36 GMT
- Title: $\mathscr{H}$-Consistency Estimation Error of Surrogate Loss Minimizers
- Authors: Pranjal Awasthi, Anqi Mao, Mehryar Mohri, Yutao Zhong
- Abstract summary: We present a detailed study of estimation errors in terms of surrogate loss estimation errors.
We refer to such guarantees as $mathscrH$-consistency estimation error bounds.
- Score: 38.56401704010528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a detailed study of estimation errors in terms of surrogate loss
estimation errors. We refer to such guarantees as $\mathscr{H}$-consistency
estimation error bounds, since they account for the hypothesis set
$\mathscr{H}$ adopted. These guarantees are significantly stronger than
$\mathscr{H}$-calibration or $\mathscr{H}$-consistency. They are also more
informative than similar excess error bounds derived in the literature, when
$\mathscr{H}$ is the family of all measurable functions. We prove general
theorems providing such guarantees, for both the distribution-dependent and
distribution-independent settings. We show that our bounds are tight, modulo a
convexity assumption. We also show that previous excess error bounds can be
recovered as special cases of our general results.
We then present a series of explicit bounds in the case of the zero-one loss,
with multiple choices of the surrogate loss and for both the family of linear
functions and neural networks with one hidden-layer. We further prove more
favorable distribution-dependent guarantees in that case. We also present a
series of explicit bounds in the case of the adversarial loss, with surrogate
losses based on the supremum of the $\rho$-margin, hinge or sigmoid loss and
for the same two general hypothesis sets. Here too, we prove several
enhancements of these guarantees under natural distributional assumptions.
Finally, we report the results of simulations illustrating our bounds and their
tightness.
Related papers
- Enhanced $H$-Consistency Bounds [30.389055604165222]
We present a framework for establishing enhanced $H$-consistency bounds based on more general inequalities relating conditional regrets.
Our theorems subsume existing results as special cases but also enable the derivation of more favorable bounds in various scenarios.
These include standard multi-class classification, binary and multi-class classification under Tsybakov noise conditions, and bipartite ranking.
arXiv Detail & Related papers (2024-07-18T17:22:40Z) - Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z) - New Lower Bounds for Private Estimation and a Generalized Fingerprinting
Lemma [10.176795938619417]
We prove new lower bounds for statistical estimation tasks under the constraint of $(varepsilon, delta)$-differential privacy.
We show that estimating the Frobenius norm requires $Omega(d2)$ samples, and in spectral norm requires $Omega(d3/2)$ samples, both matching upper bounds up to logarithmic factors.
arXiv Detail & Related papers (2022-05-17T17:55:10Z) - Robust Linear Regression for General Feature Distribution [21.0709900887309]
We investigate robust linear regression where data may be contaminated by an oblivious adversary.
We do not necessarily assume that the features are centered.
If the features are centered we can obtain a standard convergence rate.
arXiv Detail & Related papers (2022-02-04T11:22:13Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Sequential prediction under log-loss and misspecification [47.66467420098395]
We consider the question of sequential prediction under the log-loss in terms of cumulative regret.
We show that cumulative regrets in the well-specified and misspecified cases coincideally.
We provide an $o(1)$ characterization of the distribution-free or PAC regret.
arXiv Detail & Related papers (2021-01-29T20:28:23Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - Error bounds in estimating the out-of-sample prediction error using
leave-one-out cross validation in high-dimensions [19.439945058410203]
We study the problem of out-of-sample risk estimation in the high dimensional regime.
Extensive empirical evidence confirms the accuracy of leave-one-out cross validation.
One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
arXiv Detail & Related papers (2020-03-03T20:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.