Deep ReLU networks -- injectivity capacity upper bounds
- URL: http://arxiv.org/abs/2412.19677v1
- Date: Fri, 27 Dec 2024 14:57:40 GMT
- Title: Deep ReLU networks -- injectivity capacity upper bounds
- Authors: Mihailo Stojnic,
- Abstract summary: We study deep ReLU feed forward neural networks (NNs) and their injectivity abilities.
For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs.
A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level.
- Score: 0.0
- License:
- Abstract: We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.
Related papers
- MLPs at the EOC: Dynamics of Feature Learning [8.430481660019451]
We propose a theory to explain the convergence of gradient descent and the learning of features along the way.
Such a theory should also cover phenomena observed by practicioners including the Edge of Stability (EOS) and the catapult mechanism.
arXiv Detail & Related papers (2025-02-18T18:23:33Z) - Injectivity capacity of ReLU gates [0.0]
We consider the injectivity property of the ReLU networks layers.
We develop a powerful program to handle the $ell_0$ spherical perceptron and implicitly the ReLU layers injectivity.
The obtained results are also shown to fairly closely match the replica predictions from [40]
arXiv Detail & Related papers (2024-10-28T00:57:10Z) - Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$)
In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs)
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint [5.9954962391837885]
We study the gradient descent dynamics of neural networks through the lens of macroscopic limits.
Our study reveals that gradient descent can rapidly drive deep neural networks to zero training loss.
Our approach draws inspiration from the Neural Tangent Kernel (NTK) paradigm.
arXiv Detail & Related papers (2024-04-07T08:07:02Z) - Improve Generalization Ability of Deep Wide Residual Network with A
Suitable Scaling Factor [0.0]
We show that if $alpha$ is a constant, the class of functions induced by Residual Neural Kernel (RNTK) is not learnable, as the depth goes to infinity.
We also highlight a surprising phenomenon: even if we allow $alpha$ to decrease with increasing depth $L$, the degeneration phenomenon may still occur.
arXiv Detail & Related papers (2024-03-07T14:40:53Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Scalable Lipschitz Residual Networks with Convex Potential Flows [120.27516256281359]
We show that using convex potentials in a residual network gradient flow provides a built-in $1$-Lipschitz transformation.
A comprehensive set of experiments on CIFAR-10 demonstrates the scalability of our architecture and the benefit of our approach for $ell$ provable defenses.
arXiv Detail & Related papers (2021-10-25T07:12:53Z) - Improvising the Learning of Neural Networks on Hyperspherical Manifold [0.0]
The impact of convolution neural networks (CNNs) in the supervised settings provided tremendous increment in performance.
The representation learned from CNN's operated on hyperspherical manifold led to insightful outcomes in face recognition.
A broad range of activation functions is developed with hypersphere intuition which performs superior to softmax in euclidean space.
arXiv Detail & Related papers (2021-09-29T22:39:07Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.