Distribution-Specific Agnostic Conditional Classification With Halfspaces
- URL: http://arxiv.org/abs/2502.00172v1
- Date: Fri, 31 Jan 2025 21:29:20 GMT
- Title: Distribution-Specific Agnostic Conditional Classification With Halfspaces
- Authors: Jizhou Huang, Brendan Juba,
- Abstract summary: We study selective'' or conditional'' classification problems under an agnostic setting.
We present the first PAC-learning algorithm for homogeneous halfspace selectors with error guarantee.
We prove that approximating conditional classification is at least as hard as approximating classification in both additive and multiplicative form.
- Score: 27.197384293699134
- License:
- Abstract: We study ``selective'' or ``conditional'' classification problems under an agnostic setting. Classification tasks commonly focus on modeling the relationship between features and categories that captures the vast majority of data. In contrast to common machine learning frameworks, conditional classification intends to model such relationships only on a subset of the data defined by some selection rule. Most work on conditional classification either solves the problem in a realizable setting or does not guarantee the error is bounded compared to an optimal solution. In this work, we consider selective/conditional classification by sparse linear classifiers for subsets defined by halfspaces, and give both positive as well as negative results for Gaussian feature distributions. On the positive side, we present the first PAC-learning algorithm for homogeneous halfspace selectors with error guarantee $\bigO*{\sqrt{\mathrm{opt}}}$, where $\mathrm{opt}$ is the smallest conditional classification error over the given class of classifiers and homogeneous halfspaces. On the negative side, we find that, under cryptographic assumptions, approximating the conditional classification loss within a small additive error is computationally hard even under Gaussian distribution. We prove that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.
Related papers
- Classification Error Bound for Low Bayes Error Conditions in Machine Learning [50.25063912757367]
We study the relationship between the error mismatch and the Kullback-Leibler divergence in machine learning.
Motivated by recent observations of low model-based classification errors in many machine learning tasks, we propose a linear approximation of the classification error bound for low Bayes error conditions.
arXiv Detail & Related papers (2025-01-27T11:57:21Z) - Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - Class-Conditional Conformal Prediction with Many Classes [60.8189977620604]
We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores.
We find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
arXiv Detail & Related papers (2023-06-15T17:59:02Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [22.92968284023414]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.
The proposed bound shows its value in one-class classification and domain shift analysis.
Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - The Interplay between Distribution Parameters and the
Accuracy-Robustness Tradeoff in Classification [0.0]
Adrial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models.
This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution.
In this work, we focus on the latter case under a binary Gaussian mixture classification problem.
arXiv Detail & Related papers (2021-07-01T06:57:50Z) - Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers [59.06169363181417]
Predict then Interpolate (PI) is an algorithm for learning correlations that are stable across environments.
We prove that by interpolating the distributions of the correct predictions and the wrong predictions, we can uncover an oracle distribution where the unstable correlation vanishes.
arXiv Detail & Related papers (2021-05-26T15:37:48Z) - Cautious Active Clustering [79.23797234241471]
We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space.
Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class.
arXiv Detail & Related papers (2020-08-03T23:47:31Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification.
We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem.
We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z) - Bayesian Semi-supervised Multi-category Classification under Nonparanormality [2.307581190124002]
Semi-supervised learning is a model training method that uses both labeled and unlabeled data.
This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem.
arXiv Detail & Related papers (2020-01-11T21:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.