Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications
- URL: http://arxiv.org/abs/2501.04182v1
- Date: Tue, 07 Jan 2025 23:23:26 GMT
- Title: Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications
- Authors: L. Berlyand, V. Slavin,
- Abstract summary: We present results on the formation and stability of a family of fixed points of deep neural networks (DNNs)<n>We demonstrate examples of applications of such networks in supervised, semi-supervised and unsupervised learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present numerical and analytical results on the formation and stability of a family of fixed points of deep neural networks (DNNs). Such fixed points appear in a class of DNNs when dimensions of input and output vectors are the same. We demonstrate examples of applications of such networks in supervised, semi-supervised and unsupervised learning such as encoding/decoding of images, restoration of damaged images among others. We present several numerical and analytical results. First, we show that for untrained DNN's with weights and biases initialized by normally distributed random variables the only one fixed point exists. This result holds for DNN with any depth (number of layers) $L$, any layer width $N$, and sigmoid-type activation functions. Second, it has been shown that for a DNN whose parameters (weights and biases) are initialized by ``light-tailed'' distribution of weights (e.g. normal distribution), after training the distribution of these parameters become ``heavy-tailed''. This motivates our study of DNNs with ``heavy-tailed'' initialization. For such DNNs we show numerically %existence and stability that training leads to emergence of $Q(N,L)$ fixed points, where $Q(N,L)$ is a positive integer which depends on the number of layers $L$ and layer width $N$. We further observe numerically that for fixed $N = N_0$ the function $Q(N_0, L)$ is non-monotone, that is it initially grows as $L$ increases and then decreases to 1. This non-monotone behavior of $Q(N_0, L)$ is also obtained by analytical derivation of equation for Empirical Spectral Distribution (ESD) of input-output Jacobian followed by numerical solution of this equation.
Related papers
- Statistical Properties of Deep Neural Networks with Dependent Data [0.0]
This paper establishes statistical properties of deep neural network (DNN) estimators under dependent data.<n>The framework provided also offers potential for research into other DNN architectures and time-series applications.
arXiv Detail & Related papers (2024-10-14T21:46:57Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs [42.551773746803946]
Vision tasks are characterized by the properties of locality and translation invariance.
The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture.
Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories.
arXiv Detail & Related papers (2024-03-23T03:57:28Z) - Sparsifying Bayesian neural networks with latent binary variables and
normalizing flows [10.865434331546126]
We will consider two extensions to the latent binary Bayesian neural networks (LBBNN) method.
Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm.
More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian.
arXiv Detail & Related papers (2023-05-05T09:40:28Z) - Generalization and Stability of Interpolating Neural Networks with
Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime.
We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$.
With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - A Tale of Two Cities: Data and Configuration Variances in Robust Deep
Learning [27.498927971861068]
Deep neural networks (DNNs) are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving.
Prior work has shown the high accuracy of a DNN model does not imply high robustness because the input data and external environment are constantly changing.
arXiv Detail & Related papers (2022-11-18T03:32:53Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Generalizing Graph Neural Networks on Out-Of-Distribution Graphs [51.33152272781324]
Graph Neural Networks (GNNs) are proposed without considering the distribution shifts between training and testing graphs.
In such a setting, GNNs tend to exploit subtle statistical correlations existing in the training set for predictions, even though it is a spurious correlation.
We propose a general causal representation framework, called StableGNN, to eliminate the impact of spurious correlations.
arXiv Detail & Related papers (2021-11-20T18:57:18Z) - Training Stable Graph Neural Networks Through Constrained Learning [116.03137405192356]
Graph Neural Networks (GNNs) rely on graph convolutions to learn features from network data.
GNNs are stable to different types of perturbations of the underlying graph, a property that they inherit from graph filters.
We propose a novel constrained learning approach by imposing a constraint on the stability condition of the GNN within a perturbation of choice.
arXiv Detail & Related papers (2021-10-07T15:54:42Z) - Disentangling deep neural networks with rectified linear units using
duality [4.683806391173103]
We propose a novel interpretable counterpart of deep neural networks (DNNs) with rectified linear units (ReLUs)
We show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the neural path kernel (NPK)
arXiv Detail & Related papers (2021-10-06T16:51:59Z) - Generalizing Neural Networks by Reflecting Deviating Data in Production [15.498447555957773]
We present a runtime approach that mitigates DNN mis-predictions caused by unexpected runtime inputs to the DNN.
We use a distribution analyzer based on the distance metric learned by a Siamese network to identify "unseen" semantically-preserving inputs.
Our approach transforms those unexpected inputs into inputs from the training set that are identified as having similar semantics.
arXiv Detail & Related papers (2021-10-06T13:05:45Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - PDO-e$\ ext{S}^\ ext{2}$CNNs: Partial Differential Operator Based
Equivariant Spherical CNNs [77.53203546732664]
We use partial differential operators to design a spherical equivariant CNN, PDO-e$textStext2$CNN, which is exactly rotation equivariant in the continuous domain.
In experiments, PDO-e$textStext2$CNNs show greater parameter efficiency and outperform other spherical CNNs significantly on several tasks.
arXiv Detail & Related papers (2021-04-08T07:54:50Z) - Approximating smooth functions by deep neural networks with sigmoid
activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function.
We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.