The Shape of Generalization through the Lens of Norm-based Capacity Control
- URL: http://arxiv.org/abs/2502.01585v2
- Date: Mon, 19 May 2025 14:36:07 GMT
- Title: The Shape of Generalization through the Lens of Norm-based Capacity Control
- Authors: Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu,
- Abstract summary: We consider norm-based capacity measures and develop our study for random features based estimators.<n>We provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error.<n>This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size.
- Score: 20.88908358215574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under- to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.
Related papers
- Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent [7.248102801711294]
Deep Equilibrium Models (DEQs) are implicit neural networks with fixed points.<n>We introduce a novel DEQ formulation based on Mirror Descent defined in terms of a tailored non-Euclidean geometry.<n>We propose computational strategies that enable both efficient training and fully parameter-free inference.
arXiv Detail & Related papers (2025-07-15T16:33:01Z) - Variational Deep Learning via Implicit Regularization [20.449095674026363]
We show how to regularize a variational deep network implicitly via the optimization procedure.<n>We fully characterize the inductive bias of gradient descent in the case of an overparametrized linear model.
arXiv Detail & Related papers (2025-05-26T17:15:57Z) - (Neural-Symbolic) Machine Learning for Inconsistency Measurement [0.0]
We present machine-learning-based approaches for determining the emphdegree of inconsistency -- which is a numerical value -- for propositional logic knowledge bases.<n>Specifically, we present regression- and neural-based models that learn to predict the values that the inconsistency measures $incmi$ and $incat$ would assign to propositional logic knowledge bases.
arXiv Detail & Related papers (2025-02-05T08:00:30Z) - Norm-Bounded Low-Rank Adaptation [10.22454500514559]
We introduce two parameterizations that allow explicit bounds on each singular value of the weight adaptation matrix.
Experiments on vision fine-tuning benchmarks show that the proposed approach can achieve good adaptation performance.
We also explore applications in privacy-preserving model merging and low-rank matrix completion.
arXiv Detail & Related papers (2025-01-31T11:24:57Z) - On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [11.30047438005394]
This work investigates the question of how to choose the regularization norm $lVert cdot rVert$ in the context of high-dimensional adversarial training for binary classification.
We quantitatively characterize the relationship between perturbation size and the optimal choice of $lVert cdot rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.
arXiv Detail & Related papers (2024-10-21T14:53:12Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models.
We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction.
We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Least Squares Regression Can Exhibit Under-Parameterized Double Descent [6.645111950779666]
We study the relationship between the number of training data points, the number of parameters, and the generalization capabilities of models.
We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance.
arXiv Detail & Related papers (2023-05-24T03:52:48Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Evaluating Disentanglement in Generative Models Without Knowledge of
Latent Factors [71.79984112148865]
We introduce a method for ranking generative models based on the training dynamics exhibited during learning.
Inspired by recent theoretical characterizations of disentanglement, our method does not require supervision of the underlying latent factors.
arXiv Detail & Related papers (2022-10-04T17:27:29Z) - ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER)
ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities.
The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Bias-Variance Trade-off and Overlearning in Dynamic Decision Problems [1.2183405753834562]
Modern Monte Carlo-type approaches to dynamic decision problems are reformulated as empirical loss minimization.
These computational methods are then analyzed in this framework to demonstrate their effectiveness as well as their susceptibility to generalization error.
arXiv Detail & Related papers (2020-11-18T15:36:22Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Asymptotics of Ridge (less) Regression under General Source Condition [26.618200633139256]
We consider the role played by the structure of the true regression parameter.
We show that (no regularisation) can be optimal even with bounded signal-to-noise ratio (SNR)
This contrasts with previous work considering ridge regression with isotropic prior, in which case is only optimal in the limit of infinite SNR.
arXiv Detail & Related papers (2020-06-11T13:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.