Related papers: Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

URL: http://arxiv.org/abs/2510.18435v1
Date: Tue, 21 Oct 2025 09:08:58 GMT
Title: Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
Authors: Brandon Livio Annesi, Dario Bocchi, Chiara Cammarota,
Abstract summary: High-dimensional non-student loss play a central role in the theory of Machine Learning.<n>We analyze the spectrum of the Hessian at inception and identify a Baik-Ben Arous-param'ech'e (BBP) transition in the amount of data that separates regimes.
Score: 0.11666234644810891
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik-Ben Arous-P\'ech\'e (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can bend the loss landscape, shifting the transition point, even reaching the information-theoretic weak-recovery threshold in the large overparameterization limit, while also altering its qualitative nature. We distinguish between continuous and discontinuous BBP transitions and support our analytical predictions with simulations, examining how they compare to the finite-N behavior. In the case of discontinuous BBP transitions strong finite-N corrections allow the retrieval of information at a signal-to-noise ratio (SNR) smaller than the predicted BBP transition. In these cases we provide estimates for a new lower SNR threshold that marks the point at which initialization becomes entirely uninformative.

Related papers

Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data [52.737775129027575]
We show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits Neural Collapse (NC)<n>We reveal the role of the implicit bias of the training dynamics in facilitating the emergence of NC.
arXiv Detail & Related papers (2025-10-24T01:36:19Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Geometric Neural Process Fields [58.77241763774756]
Geometric Neural Process Fields (G-NPF) is a probabilistic framework for neural radiance fields that explicitly captures uncertainty.<n>Building on these bases, we design a hierarchical latent variable model, allowing G-NPF to integrate structural information across multiple spatial levels.<n> Experiments on novel-view synthesis for 3D scenes, as well as 2D image and 1D signal regression, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-02-04T14:17:18Z)
In Search of a Data Transformation That Accelerates Neural Field Training [37.39915075581319]
We focus on how permuting pixel locations affect the convergence speed of SGD. Counterly, we find that randomly permuting the pixel locations can considerably accelerate the training. Our analyses suggest that the random pixel permutations remove the easy-to-fit patterns, which hinder easy optimization in the early stage but capture fine details of the signal.
arXiv Detail & Related papers (2023-11-28T06:17:49Z)
PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded Diffraction Patterns Phase Retrieval [6.982256124089]
Phase retrieval is a challenge nonlinear inverse problem in computational imaging and image processing. We have developed PRISTA-Net, a deep unfolding network based on the first-order iterative threshold threshold algorithm (ISTA) All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold, and step size, are learned-to-end instead of being set.
arXiv Detail & Related papers (2023-09-08T07:37:15Z)
On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations [0.0]
We investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. We show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized.
arXiv Detail & Related papers (2023-05-17T02:26:34Z)
Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations [5.279475826661643]
Equilibrium propagation (EP) is an alternative to backpropagation (BP) that allows the training of deep neural networks with local learning rules. We show analytically that this extension naturally leads to exact gradients even for finite-amplitude teaching signals. We establish the first benchmark for EP on the ImageNet 32x32 dataset and show that it matches the performance of an equivalent network trained with BP.
arXiv Detail & Related papers (2022-09-01T15:23:49Z)
Constrained Parameter Inference as a Principle for Learning [5.080518039966762]
We propose constrained parameter inference (COPI) as a new principle for learning. COPI allows for the estimation of network parameters under the constraints of decorrelated neural inputs and top-down perturbations of neural states. We show that COPI not only is more biologically plausible but also provides distinct advantages for fast learning, compared with standard backpropagation of error.
arXiv Detail & Related papers (2022-03-22T13:40:57Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.