Understanding Aggregations of Proper Learners in Multiclass Classification
- URL: http://arxiv.org/abs/2410.22749v1
- Date: Wed, 30 Oct 2024 07:12:02 GMT
- Title: Understanding Aggregations of Proper Learners in Multiclass Classification
- Authors: Julian Asilis, Mikael Møller Høgsgaard, Grigoris Velegkas,
- Abstract summary: Multiclass learnability is known to exhibit a properness barrier.
Recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners.
We show that a single ERM requires $Omega left(fracd_G ln (1 / delta)epsilonright)$ samples.
- Score: 4.422219522591412
- License:
- Abstract: Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification? We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$ samples on such classes, exceeding the lower bound of Daniely et al. (2015) by a factor of $\ln(1 / \epsilon)$. For multiclass learning in full generality -- i.e., for classes of finite DS dimension but possibly infinite Graph dimension -- we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.
Related papers
- Self-Directed Learning of Convex Labelings on Graphs [11.286983359775512]
We study the problem of learning the clusters of a given graph in the self-directed learning setup.
No results previously existed specifically for self-directed node classification on graphs.
arXiv Detail & Related papers (2024-09-02T19:13:26Z) - Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Efficient Algorithms for Learning Monophonic Halfspaces in Graphs [7.619404259039284]
We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings.
Our main result is that a monophonic halfspace can be learned with near-optimal complexity in time in $n = |V(G)|$.
We also show that the concept class can be enumerated with delay $operatornamepoly(n)$, and that empirical risk can be performed in time $2omega(G)operatornamepoly(n)$.
arXiv Detail & Related papers (2024-05-01T20:34:12Z) - The Sample Complexity of Multi-Distribution Learning for VC Classes [25.73730126599202]
Multi-distribution learning is a generalization of PAC learning to settings with multiple data distributions.
There remains a significant gap between the known upper and lower bounds for PAC-learnable classes.
We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
arXiv Detail & Related papers (2023-07-22T18:02:53Z) - Agnostic Multi-Group Active Learning [24.97598179536084]
We consider a variant of this problem from the perspective of active learning, where the learner is endowed with the power to decide which examples are labeled from each distribution in the collection.
Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective.
We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning.
arXiv Detail & Related papers (2023-06-02T21:24:13Z) - Quantum and classical low-degree learning via a dimension-free Remez
inequality [52.12931955662553]
We show a new way to relate functions on the hypergrid to their harmonic extensions over the polytorus.
We show the supremum of a function $f$ over products of the cyclic group $exp(2pi i k/K)_k=1K$.
We extend to new spaces a recent line of work citeEI22, CHP, VZ22 that gave similarly efficient methods for learning low-degrees on hypercubes and observables on qubits.
arXiv Detail & Related papers (2023-01-04T04:15:40Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Classification Under Misspecification: Halfspaces, Generalized Linear
Models, and Connections to Evolvability [39.01599245403753]
In particular, we study the problem of learning halfspaces under Massart noise with rate $eta$.
We show any SQ algorithm requires super-polynomially many queries to achieve $mathsfOPT + epsilon$.
We also study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties.
arXiv Detail & Related papers (2020-06-08T17:59:11Z) - Private Learning of Halfspaces: Simplifying the Construction and
Reducing the Sample Complexity [63.29100726064574]
We present a differentially private learner for halfspaces over a finite grid $G$ in $mathbbRd$ with sample complexity $approx d2.5cdot 2log*|G|$.
The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem.
arXiv Detail & Related papers (2020-04-16T16:12:10Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.