The Price of Robustness: Stable Classifiers Need Overparameterization
- URL: http://arxiv.org/abs/2603.02806v1
- Date: Tue, 03 Mar 2026 09:47:06 GMT
- Title: The Price of Robustness: Stable Classifiers Need Overparameterization
- Authors: Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok,
- Abstract summary: We establish a generalization bound for finite function classes that improves inversely with class stability.<n>We derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke.<n>Experiments support our theory, while traditional norm-based measures remain largely uninformative.
- Score: 17.335490896384265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.
Related papers
- Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - Stability and Concentration in Nonlinear Inverse Problems with Block-Structured Parameters: Lipschitz Geometry, Identifiability, and an Application to Gaussian Splatting [0.552480439325792]
We develop an operator-theoretic framework for stability and statistical concentration in nonlinear inverse problems with block-structured parameters.<n>Overall, the analysis characterizes operator-level limits for a broad class of high-dimensional nonlinear inverse problems arising in modern imaging and differentiable rendering.
arXiv Detail & Related papers (2026-02-10T05:11:06Z) - Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability [2.5782420501870296]
We argue that stability and statistical efficiency can coexist within a single contextual bandit method.<n>We show that our algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors.
arXiv Detail & Related papers (2025-12-23T13:53:53Z) - Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density [93.32594873253534]
Trustworthy machine learning requires meticulous regulation of model reliance on non-robust features.
We propose a framework to delineate and regulate such features by attributing model predictions to the input.
arXiv Detail & Related papers (2024-07-05T09:16:56Z) - A Precise Characterization of SGD Stability Using Loss Surface Geometry [8.942671556572073]
Descent Gradient (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding.
Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates.
arXiv Detail & Related papers (2024-01-22T19:46:30Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Toward Better Generalization Bounds with Locally Elastic Stability [41.7030651617752]
We argue that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability.
We revisit the examples of bounded support vector machines, regularized least square regressions, and gradient descent.
arXiv Detail & Related papers (2020-10-27T02:04:53Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Stable and consistent density-based clustering via multiparameter persistence [49.1574468325115]
We consider the degree-Rips construction from topological data analysis.<n>We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.<n>We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - The Curse of Performance Instability in Analysis Datasets: Consequences,
Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable.
This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets?
We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.