Related papers: Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities

Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities

URL: http://arxiv.org/abs/2602.00693v1
Date: Sat, 31 Jan 2026 12:30:31 GMT
Title: Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities
Authors: Marco Nurisso, Pierrick Leroy, Giovanni Petri, Francesco Vaccarino,
Abstract summary: We show that singularities are intricately connected to the topology of the underlying DAG and its induced sub-networks.<n>We discuss the reachability of these singularities and establish a principled connection with differentiable pruning.
Score: 4.110453843035319
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the properties of the parameter space in feed-forward ReLU networks is critical for effectively analyzing and guiding training dynamics. After initialization, training under gradient flow decisively restricts the parameter space to an algebraic variety that emerges from the homogeneous nature of the ReLU activation function. In this study, we examine two key challenges associated with feed-forward ReLU networks built on general directed acyclic graph (DAG) architectures: the (dis)connectedness of the parameter space and the existence of singularities within it. We extend previous results by providing a thorough characterization of connectedness, highlighting the roles of bottleneck nodes and balance conditions associated with specific subsets of the network. Our findings clearly demonstrate that singularities are intricately connected to the topology of the underlying DAG and its induced sub-networks. We discuss the reachability of these singularities and establish a principled connection with differentiable pruning. We validate our theory with simple numerical experiments.

Related papers

On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks [0.0]
This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics.<n>We show that the global convergence properties can be derived for any cost function that is proper and real analytic.<n>We discuss how these insights may generalize to neural networks with sigmoidal activations.
arXiv Detail & Related papers (2025-11-12T23:27:02Z)
Discrete Functional Geometry of ReLU Networks via ReLU Transition Graphs [0.0]
We extend the ReLU Transition Graph (RTG) framework into a comprehensive graph-theoretic model for understanding deep ReLU networks.<n>In this model, each node represents a linear activation region, and edges connect regions that differ by a single ReLU activation flip.
arXiv Detail & Related papers (2025-09-03T06:38:22Z)
Constraining the outputs of ReLU neural networks [13.645092880691188]
We introduce a class of algebraic varieties naturally associated with ReLU neural networks.<n>By analyzing the rank constraints on the network outputs within each activation region, we derive a structure that characterizes the functions representable by the network.
arXiv Detail & Related papers (2025-08-05T19:30:11Z)
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
On the Lipschitz Constant of Deep Networks and Double Descent [9.233158826773247]
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable.<n>We present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent.
arXiv Detail & Related papers (2023-01-28T23:22:49Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics [1.5393457051344297]
We propose reparametrizing ReLU NNs as continuous piecewise linear splines. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
arXiv Detail & Related papers (2020-08-04T19:19:49Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.