Related papers: Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition

Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition

URL: http://arxiv.org/abs/2505.08262v1
Date: Tue, 13 May 2025 06:26:04 GMT
Title: Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition
Authors: Nathanael Tepakbong, Ding-Xuan Zhou, Xiang Zhou,
Abstract summary: We show that DNNs which minimize the empirical risk with square loss surrogate and $ell_p$ penalty can achieve finite-sample excess risk of order $mathcalOleft(n-alpharight)$ for arbitrarily large $alpha>0$ under the hard-margin condition.<n>The proof relies on a novel decomposition of the excess risk which might be of independent interest.
Score: 9.993044620455338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the classical binary classification problem for hypothesis spaces of Deep Neural Networks (DNNs) with ReLU activation under Tsybakov's low-noise condition with exponent $q>0$, and its limit-case $q\to\infty$ which we refer to as the "hard-margin condition". We show that DNNs which minimize the empirical risk with square loss surrogate and $\ell_p$ penalty can achieve finite-sample excess risk bounds of order $\mathcal{O}\left(n^{-\alpha}\right)$ for arbitrarily large $\alpha>0$ under the hard-margin condition, provided that the regression function $\eta$ is sufficiently smooth. The proof relies on a novel decomposition of the excess risk which might be of independent interest.

Related papers

Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification [27.33243506775655]
We study generalization performance of unregularized methods for separable linear classification.<n>We show that convergence rates are crucially influenced by the geometry of the loss template.
arXiv Detail & Related papers (2025-05-28T13:39:14Z)
Minimax learning rates for estimating binary classifiers under margin conditions [0.0]
We study classification problems using binary estimators where the decision boundary is described by horizon functions.<n>We establish upper and lower bounds for the minimax learning rate over broad function classes with bounded Kolmogorov entropy in Lebesgue norms.
arXiv Detail & Related papers (2025-05-15T18:05:10Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Adversarial Contextual Bandits Go Kernelized [21.007410990554522]
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space. We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
arXiv Detail & Related papers (2023-10-02T19:59:39Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems. We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting. To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z)
Stability and Risk Bounds of Iterative Hard Thresholding [41.082982732100696]
We introduce a novel sparse generalization theory for IHT under the notion of algorithmic stability. We show that IHT with sparsity level $k$ enjoys an $mathcaltilde O(n-1/2sqrtlog(n)log(p))$ rate of convergence in sparse excess risk. Preliminary numerical evidence is provided to confirm our theoretical predictions.
arXiv Detail & Related papers (2022-03-17T16:12:56Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss. We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z)
Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$ [4.1499725848998965]
We show a high probability excess risk bound with the rate $O(log n/n)$ for strongly convex and Lipschitz losses valid for emphany empirical risk minimization method. We discuss how $O(log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
arXiv Detail & Related papers (2021-03-22T17:28:40Z)
Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification. Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.