Distribution-Free Rates in Neyman-Pearson Classification
- URL: http://arxiv.org/abs/2402.09560v1
- Date: Wed, 14 Feb 2024 20:21:43 GMT
- Title: Distribution-Free Rates in Neyman-Pearson Classification
- Authors: Mohammadreza M. Kalan, Samory Kpotufe
- Abstract summary: We consider the problem of Neyman-Pearson classification which models unbalanced classification settings.
We provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs.
The rates involve a dichotomy between hard and easy classes $mathcalH$ as characterized by a simple geometric condition.
- Score: 4.169915659794566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of Neyman-Pearson classification which models
unbalanced classification settings where error w.r.t. a distribution $\mu_1$ is
to be minimized subject to low error w.r.t. a different distribution $\mu_0$.
Given a fixed VC class $\mathcal{H}$ of classifiers to be minimized over, we
provide a full characterization of possible distribution-free rates, i.e.,
minimax rates over the space of all pairs $(\mu_0, \mu_1)$. The rates involve a
dichotomy between hard and easy classes $\mathcal{H}$ as characterized by a
simple geometric condition, a three-points-separation condition, loosely
related to VC dimension.
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - On Rate-Optimal Partitioning Classification from Observable and from
Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions.
The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed.
We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z) - Classification Tree Pruning Under Covariate Shift [7.982668978293684]
We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance.
We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
arXiv Detail & Related papers (2023-05-07T17:08:21Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Non-Convex SGD Learns Halfspaces with Adversarial Label Noise [50.659479930171585]
We show the solution to the problem of learning surrogate learning homogeneous halfspaces in the distribution-specific model.
In any convex distributions, we show that the misclassification error inherently leads to misclassification error of halfspace.
arXiv Detail & Related papers (2020-06-11T18:55:59Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.