Phase transitions reveal hierarchical structure in deep neural networks
- URL: http://arxiv.org/abs/2512.11866v1
- Date: Fri, 05 Dec 2025 15:14:09 GMT
- Title: Phase transitions reveal hierarchical structure in deep neural networks
- Authors: Ibrahim Talha Ersoy, Andrés Fernando Cardozo Licha, Karoline Wiesner,
- Abstract summary: We show that phase transitions in Deep Neural Networks are governed by saddle points in the loss landscape.<n>We introduce a simple, fast, and easy to implement algorithm that uses the L2 regularizer as a tool to probe the geometry of error landscapes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training Deep Neural Networks relies on the model converging on a high-dimensional, non-convex loss landscape toward a good minimum. Yet, much of the phenomenology of training remains ill understood. We focus on three seemingly disparate observations: the occurrence of phase transitions reminiscent of statistical physics, the ubiquity of saddle points, and phenomenon of mode connectivity relevant for model merging. We unify these within a single explanatory framework, the geometry of the loss and error landscapes. We analytically show that phase transitions in DNN learning are governed by saddle points in the loss landscape. Building on this insight, we introduce a simple, fast, and easy to implement algorithm that uses the L2 regularizer as a tool to probe the geometry of error landscapes. We apply it to confirm mode connectivity in DNNs trained on the MNIST dataset by efficiently finding paths that connect global minima. We then show numerically that saddle points induce transitions between models that encode distinct digit classes. Our work establishes the geometric origin of key training phenomena in DNNs and reveals a hierarchy of accuracy basins analogous to phases in statistical physics.
Related papers
- Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks [0.0]
Increasing the L2 regularization of Deep Neural Networks (DNNs) causes a first-order phase transition into the under-parametrized phase.<n>We predict new transition points as the data complexity is increased and, in accordance with the theory of phase transitions, the existence of effects.
arXiv Detail & Related papers (2025-05-10T11:02:30Z) - Deep Loss Convexification for Learning Iterative Models [11.36644967267829]
Iterative methods such as iterative closest point (ICP) for point cloud registration often suffer from bad local optimality.
We propose learning to form a convex landscape around each ground truth.
arXiv Detail & Related papers (2024-11-16T01:13:04Z) - Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation.
Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers.
Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z) - Exploring Geometric Deep Learning For Precipitation Nowcasting [28.44612565923532]
We propose a geometric deep learning-based temporal Graph Convolutional Network (GCN) for precipitation nowcasting.
The adjacency matrix that simulates the interactions among grid cells is learned automatically by minimizing the L1 loss between prediction and ground truth pixel value.
We test the model on sequences of radar reflectivity maps over the Trento/Italy area.
arXiv Detail & Related papers (2023-09-11T21:14:55Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Unsupervised mapping of phase diagrams of 2D systems from infinite
projected entangled-pair states via deep anomaly detection [0.0]
We demonstrate how to map out the phase diagram of a two dimensional quantum many body system with no prior physical knowledge.
As a benchmark, the phase diagram of the 2D frustrated bilayer Heisenberg model is analyzed.
We show that in order to get a good qualitative picture of the transition lines, it suffices to use data from the cost-efficient simple update optimization.
arXiv Detail & Related papers (2021-05-19T12:19:20Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.