Related papers: The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks

The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks

URL: http://arxiv.org/abs/2505.17958v1
Date: Fri, 23 May 2025 14:31:14 GMT
Title: The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
Authors: Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Florent Krzakala,
Abstract summary: We study the high-dimensionality of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data.<n>We derive sharps for both training and test errors by mapping the $ell$-regularized learning problem to a convex matrix sensing task with nuclear normization.
Score: 28.79071954165469
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $\ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in such networks emerges from a low-rank structure in the learned feature maps. Our results characterize the global minima of the loss and yield precise generalization thresholds, showing how the width of the target function governs learnability. This analysis bridges and extends ideas from spin-glass methods, matrix factorization, and convex optimization and emphasizes the deep link between low-rank matrix sensing and learning in quadratic neural networks.

Related papers

Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks. We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations. We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z)
Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices [0.0]
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network.
arXiv Detail & Related papers (2023-11-01T11:38:31Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)
Avoiding Spurious Local Minima in Deep Quadratic Networks [0.0]
We characterize the landscape of the mean squared nonlinear error for networks with neural activation functions. We prove that deepized neural networks with quadratic activations benefit from similar landscape properties.
arXiv Detail & Related papers (2019-12-31T22:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.