Related papers: Distribution-Free Rates in Neyman-Pearson Classification

Distribution-Free Rates in Neyman-Pearson Classification

URL: http://arxiv.org/abs/2402.09560v1
Date: Wed, 14 Feb 2024 20:21:43 GMT
Title: Distribution-Free Rates in Neyman-Pearson Classification
Authors: Mohammadreza M. Kalan, Samory Kpotufe
Abstract summary: We consider the problem of Neyman-Pearson classification which models unbalanced classification settings. We provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs. The rates involve a dichotomy between hard and easy classes $mathcalH$ as characterized by a simple geometric condition.
Score: 4.169915659794566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution $\mu_1$ is to be minimized subject to low error w.r.t. a different distribution $\mu_0$. Given a fixed VC class $\mathcal{H}$ of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs $(\mu_0, \mu_1)$. The rates involve a dichotomy between hard and easy classes $\mathcal{H}$ as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class. Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z)
On Rate-Optimal Partitioning Classification from Observable and from Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions. The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed. We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z)
Classification Tree Pruning Under Covariate Shift [7.982668978293684]
We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance. We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
arXiv Detail & Related papers (2023-05-07T17:08:21Z)
Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z)
Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss. We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z)
Non-Convex SGD Learns Halfspaces with Adversarial Label Noise [50.659479930171585]
We show the solution to the problem of learning surrogate learning homogeneous halfspaces in the distribution-specific model. In any convex distributions, we show that the misclassification error inherently leads to misclassification error of halfspace.
arXiv Detail & Related papers (2020-06-11T18:55:59Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.