Sample Complexity of Agnostic Multiclass Classification: Natarajan Dimension Strikes Back
- URL: http://arxiv.org/abs/2511.12659v1
- Date: Sun, 16 Nov 2025 15:47:47 GMT
- Title: Sample Complexity of Agnostic Multiclass Classification: Natarajan Dimension Strikes Back
- Authors: Alon Cohen, Liad Erez, Steve Hanneke, Tomer Koren, Yishay Mansour, Shay Moran, Qian Zhang,
- Abstract summary: We show that multiclass PAC sample complexity is governed by two distinct dimensions.<n>Unlike binary or online classification, multiclass learning inherently involves two structural parameters.<n>A key ingredient is a novel online procedure based on a self-adaptive multiplicative-weights algorithm performing a label-space reduction.
- Score: 84.81507553557556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fundamental theorem of statistical learning states that binary PAC learning is governed by a single parameter -- the Vapnik-Chervonenkis (VC) dimension -- which determines both learnability and sample complexity. Extending this to multiclass classification has long been challenging, since Natarajan's work in the late 80s proposing the Natarajan dimension (Nat) as a natural analogue of VC. Daniely and Shalev-Shwartz (2014) introduced the DS dimension, later shown by Brukhim et al. (2022) to characterize multiclass learnability. Brukhim et al. also showed that Nat and DS can diverge arbitrarily, suggesting that multiclass learning is governed by DS rather than Nat. We show that agnostic multiclass PAC sample complexity is in fact governed by two distinct dimensions. Specifically, we prove nearly tight agnostic sample complexity bounds that, up to log factors, take the form $\frac{DS^{1.5}}ε + \frac{Nat}{ε^2}$ where $ε$ is the excess risk. This bound is tight up to a $\sqrt{DS}$ factor in the first term, nearly matching known $Nat/ε^2$ and $DS/ε$ lower bounds. The first term reflects the DS-controlled regime, while the second shows that the Natarajan dimension still dictates asymptotic behavior for small $ε$. Thus, unlike binary or online classification -- where a single dimension (VC or Littlestone) controls both phenomena -- multiclass learning inherently involves two structural parameters. Our technical approach departs from traditional agnostic learning methods based on uniform convergence or reductions to realizable cases. A key ingredient is a novel online procedure based on a self-adaptive multiplicative-weights algorithm performing a label-space reduction, which may be of independent interest.
Related papers
- Learning Intersections of Two Margin Halfspaces under Factorizable Distributions [56.51474048985742]
Learning intersections of halfspaces is a central problem in Computational Learning Theory.<n>Even for just two halfspaces, it remains a major open question whether learning is possible in time.<n>We introduce a novel algorithm that provably circumvents the CSQ hardness barrier.
arXiv Detail & Related papers (2025-11-13T00:28:24Z) - On Reductions and Representations of Learning Problems in Euclidean Spaces [15.07787640047213]
Many practical prediction algorithms represent inputs in Euclidean space and replace the discrete 0/1 classification loss with a real-trivial surrogate loss.
We develop a generalization of the Borsuk-Ulam Theorem that combines the classical topological approach with convexity considerations.
We also present natural classification tasks that can be represented in much smaller dimensions by leveraging randomness.
arXiv Detail & Related papers (2024-11-16T12:09:37Z) - Is Transductive Learning Equivalent to PAC Learning? [0.9012198585960443]
We show that the PAC and transductive models are essentially equivalent for agnostic binary classification.
We leave as an intriguing open question whether our second result can be extended beyond binary classification to show the transductive and PAC models equivalent more broadly.
arXiv Detail & Related papers (2024-05-08T16:26:49Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - When is Agnostic Reinforcement Learning Statistically Tractable? [76.1408672715773]
A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics.
We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for learnability between generative access and online access models.
arXiv Detail & Related papers (2023-10-09T19:40:54Z) - On Learning Latent Models with Multi-Instance Weak Supervision [57.18649648182171]
We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $sigma$ labels associated with multiple input instances.
Our problem is met in different fields, including latent structural learning and neuro-symbolic integration.
arXiv Detail & Related papers (2023-06-23T22:05:08Z) - Lower Bounds for Learning in Revealing POMDPs [88.23337313766355]
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging emphpartially observable setting.
For emphmulti-step revealing POMDPs, we show that the latent state-space dependence is at least $Omega(S1.5)$ in the sample complexity.
arXiv Detail & Related papers (2023-02-02T18:59:30Z) - A Characterization of List Learnability [15.368858716555888]
We consider list PAC learning where the goal is to output a list of $k$ predictions.
Generalizing the recent characterization of multiclass learnability, we show that a hypothesis class is $k$-list learnable if and only if the $k$-DS dimension is finite.
arXiv Detail & Related papers (2022-11-07T21:28:05Z) - Fine-Grained Distribution-Dependent Learning Curves [27.09513298165498]
Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples.
In this paper we introduce a new dimension characterization called the grained PAC that improves and refines the recent results of Bousquet et al.
Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds.
arXiv Detail & Related papers (2022-08-31T03:29:21Z) - A Characterization of Multiclass Learnability [18.38631912121182]
We characterize multiclass PAC learnability through the DS dimension, a dimension defined by Daniely and Shalev-Shwartz 2014.
In the list learning setting, instead of predicting a single outcome for a given unseen input, the goal is to provide a short menu of predictions.
Our second main result concerns the Natarajan dimension, which has been a central candidate for characterizing multiclass learnability.
arXiv Detail & Related papers (2022-03-03T07:41:54Z) - Sample-efficient proper PAC learning with approximate differential
privacy [51.09425023771246]
We prove that the sample complexity of properly learning a class of Littlestone dimension $d$ with approximate differential privacy is $tilde O(d6)$, ignoring privacy and accuracy parameters.
This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of $2O(d)$ on the sample complexity.
arXiv Detail & Related papers (2020-12-07T18:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.