On Space Folds of ReLU Neural Networks
- URL: http://arxiv.org/abs/2502.09954v1
- Date: Fri, 14 Feb 2025 07:22:24 GMT
- Title: On Space Folds of ReLU Neural Networks
- Authors: Michal Lewandowski, Hamid Eghbalzadeh, Bernhard Heinzl, Raphael Pisoni, Bernhard A. Moser,
- Abstract summary: Recent findings suggest that ReLU neural networks can be understood geometrically as space folding of the input space.
We present the first quantitative analysis of this space phenomenon in ReLU models.
- Score: 6.019268056469171
- License:
- Abstract: Recent findings suggest that the consecutive layers of ReLU neural networks can be understood geometrically as space folding transformations of the input space, revealing patterns of self-similarity. In this paper, we present the first quantitative analysis of this space folding phenomenon in ReLU neural networks. Our approach focuses on examining how straight paths in the Euclidean input space are mapped to their counterparts in the Hamming activation space. In this process, the convexity of straight lines is generally lost, giving rise to non-convex folding behavior. To quantify this effect, we introduce a novel measure based on range metrics, similar to those used in the study of random walks, and provide the proof for the equivalence of convexity notions between the input and activation spaces. Furthermore, we provide empirical analysis on a geometrical analysis benchmark (CantorNet) as well as an image classification benchmark (MNIST). Our work advances the understanding of the activation space in ReLU neural networks by leveraging the phenomena of geometric folding, providing valuable insights on how these models process input information.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH)
When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction.
These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z) - Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs)
We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps.
In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z) - A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes [49.32130498861987]
We study the case of non-differentiable activation functions, such as ReLU.
Two recent works introduced a geometric framework to study neural networks.
We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.
arXiv Detail & Related papers (2024-04-09T08:11:46Z) - On the Computational Entanglement of Distant Features in Adversarial Machine Learning [8.87656044562629]
We introduce the concept of "computational entanglement"
Computational entanglement enables the network to achieve zero loss by fitting random noise, even on previously unseen test samples.
We present a novel application of computational entanglement in transforming a worst-case adversarial examples-inputs that are highly non-robust.
arXiv Detail & Related papers (2023-09-27T14:09:15Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Quiver neural networks [5.076419064097734]
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures.
Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows.
arXiv Detail & Related papers (2022-07-26T09:42:45Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - A singular Riemannian geometry approach to Deep Neural Networks II.
Reconstruction of 1-D equivalence classes [78.120734120667]
We build the preimage of a point in the output manifold in the input space.
We focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces.
arXiv Detail & Related papers (2021-12-17T11:47:45Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Implicit Geometric Regularization for Learning Shapes [34.052738965233445]
We offer a new paradigm for computing high fidelity implicit neural representations directly from raw data.
We show that our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.
arXiv Detail & Related papers (2020-02-24T07:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.