Beyond Tsybakov: Model Margin Noise and $\mathcal{H}$-Consistency Bounds
- URL: http://arxiv.org/abs/2511.15816v1
- Date: Wed, 19 Nov 2025 19:13:39 GMT
- Title: Beyond Tsybakov: Model Margin Noise and $\mathcal{H}$-Consistency Bounds
- Authors: Mehryar Mohri, Yutao Zhong,
- Abstract summary: We introduce a new low-noise condition for classification, the Model Margin Noise (MM noise) assumption.<n>We derive enhanced $mathcalH$-consistency bounds for both binary and multi-class classification.
- Score: 42.67092904252001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new low-noise condition for classification, the Model Margin Noise (MM noise) assumption, and derive enhanced $\mathcal{H}$-consistency bounds under this condition. MM noise is weaker than Tsybakov noise condition: it is implied by Tsybakov noise condition but can hold even when Tsybakov fails, because it depends on the discrepancy between a given hypothesis and the Bayes-classifier rather than on the intrinsic distributional minimal margin (see Figure 1 for an illustration of an explicit example). This hypothesis-dependent assumption yields enhanced $\mathcal{H}$-consistency bounds for both binary and multi-class classification. Our results extend the enhanced $\mathcal{H}$-consistency bounds of Mao, Mohri, and Zhong (2025a) with the same favorable exponents but under a weaker assumption than the Tsybakov noise condition; they interpolate smoothly between linear and square-root regimes for intermediate noise levels. We also instantiate these bounds for common surrogate loss families and provide illustrative tables.
Related papers
- Skewness-Robust Causal Discovery in Location-Scale Noise Models [47.09233752567902]
We propose SkewD, a likelihood-based algorithm for causal discovery under location-scale noise models.<n>SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise.<n>We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets.
arXiv Detail & Related papers (2025-11-18T12:40:41Z) - Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises [55.43924214633558]
In this paper, we focus on two types of noises: one is sub-Weibull noises, and the other is SsBC noises.<n>Under these two noise assumptions, the in-expectation and high-probability convergence of SFOMs have been studied in the contexts of convex optimization and smooth optimization.
arXiv Detail & Related papers (2025-07-17T16:48:45Z) - Regularized least squares learning with heavy-tailed noise is minimax optimal [22.406170258823803]
This paper examines the performance of ridge regression in kernel Hilbert spaces in the presence of noise that exhibits a finite number of reproducing higher moments.<n>We establish risk bounds consisting of subgaussian and excess terms based on the well known integral operator framework.
arXiv Detail & Related papers (2025-05-20T11:17:54Z) - Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees [56.80920351680438]
We study high-probability convergence in online learning, in the presence of heavy-tailed noise.<n>We provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments.
arXiv Detail & Related papers (2024-10-17T18:25:28Z) - Revisiting Convergence of AdaGrad with Relaxed Assumptions [4.189643331553922]
We revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on problems.
This model encompasses a broad range noises including sub-auau in many practical applications.
arXiv Detail & Related papers (2024-02-21T13:24:14Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Robust Learning under Strong Noise via SQs [5.9256596453465225]
We show that every SQ learnable class admits an efficient learning algorithm with OPT + $epsilon misilon misclassification error for a broad class of noise models.
This setting substantially generalizes the widely-studied problem classification under RCN with known noise probabilities.
arXiv Detail & Related papers (2020-10-18T21:02:26Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.