Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition
- URL: http://arxiv.org/abs/2505.08262v1
- Date: Tue, 13 May 2025 06:26:04 GMT
- Title: Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition
- Authors: Nathanael Tepakbong, Ding-Xuan Zhou, Xiang Zhou,
- Abstract summary: We show that DNNs which minimize the empirical risk with square loss surrogate and $ell_p$ penalty can achieve finite-sample excess risk of order $mathcalOleft(n-alpharight)$ for arbitrarily large $alpha>0$ under the hard-margin condition.<n>The proof relies on a novel decomposition of the excess risk which might be of independent interest.
- Score: 9.993044620455338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the classical binary classification problem for hypothesis spaces of Deep Neural Networks (DNNs) with ReLU activation under Tsybakov's low-noise condition with exponent $q>0$, and its limit-case $q\to\infty$ which we refer to as the "hard-margin condition". We show that DNNs which minimize the empirical risk with square loss surrogate and $\ell_p$ penalty can achieve finite-sample excess risk bounds of order $\mathcal{O}\left(n^{-\alpha}\right)$ for arbitrarily large $\alpha>0$ under the hard-margin condition, provided that the regression function $\eta$ is sufficiently smooth. The proof relies on a novel decomposition of the excess risk which might be of independent interest.
Related papers
- Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification [27.33243506775655]
We study generalization performance of unregularized methods for separable linear classification.<n>We show that convergence rates are crucially influenced by the geometry of the loss template.
arXiv Detail & Related papers (2025-05-28T13:39:14Z) - Minimax learning rates for estimating binary classifiers under margin conditions [0.0]
We study classification problems using binary estimators where the decision boundary is described by horizon functions.<n>We establish upper and lower bounds for the minimax learning rate over broad function classes with bounded Kolmogorov entropy in Lebesgue norms.
arXiv Detail & Related papers (2025-05-15T18:05:10Z) - Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Adversarial Contextual Bandits Go Kernelized [21.007410990554522]
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space.
We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
arXiv Detail & Related papers (2023-10-02T19:59:39Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Stability and Risk Bounds of Iterative Hard Thresholding [41.082982732100696]
We introduce a novel sparse generalization theory for IHT under the notion of algorithmic stability.
We show that IHT with sparsity level $k$ enjoys an $mathcaltilde O(n-1/2sqrtlog(n)log(p))$ rate of convergence in sparse excess risk.
Preliminary numerical evidence is provided to confirm our theoretical predictions.
arXiv Detail & Related papers (2022-03-17T16:12:56Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Stability and Deviation Optimal Risk Bounds with Convergence Rate
$O(1/n)$ [4.1499725848998965]
We show a high probability excess risk bound with the rate $O(log n/n)$ for strongly convex and Lipschitz losses valid for emphany empirical risk minimization method.
We discuss how $O(log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
arXiv Detail & Related papers (2021-03-22T17:28:40Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.