Neural Network Layer Matrix Decomposition reveals Latent Manifold
Encoding and Memory Capacity
- URL: http://arxiv.org/abs/2309.05968v1
- Date: Tue, 12 Sep 2023 05:36:08 GMT
- Title: Neural Network Layer Matrix Decomposition reveals Latent Manifold
Encoding and Memory Capacity
- Authors: Ng Shyh-Chang, A-Li Luo, Bo Qiu
- Abstract summary: We show that for every stably converged NN of continuous activation functions, its weight matrix encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain.
Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity.
This Layer Matrix Decomposition (LMD) further suggests a close relationship between eigen-decomposition of NN layers and the latest advances in conceptualizations of Hopfield networks and Transformer NN models.
- Score: 1.2891210250935148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove the converse of the universal approximation theorem, i.e. a neural
network (NN) encoding theorem which shows that for every stably converged NN of
continuous activation functions, its weight matrix actually encodes a
continuous function that approximates its training dataset to within a finite
margin of error over a bounded domain. We further show that using the
Eckart-Young theorem for truncated singular value decomposition of the weight
matrix for every NN layer, we can illuminate the nature of the latent space
manifold of the training dataset encoded and represented by every NN layer, and
the geometric nature of the mathematical operations performed by each NN layer.
Our results have implications for understanding how NNs break the curse of
dimensionality by harnessing memory capacity for expressivity, and that the two
are complementary. This Layer Matrix Decomposition (LMD) further suggests a
close relationship between eigen-decomposition of NN layers and the latest
advances in conceptualizations of Hopfield networks and Transformer NN models.
Related papers
- Invariant deep neural networks under the finite group for solving partial differential equations [1.4916944282865694]
We design a symmetry-enhanced deep neural network (sDNN) which makes the architecture of neural networks invariant under the finite group.
Numerical results show that the sDNN has strong predicted abilities in and beyond the sampling domain.
arXiv Detail & Related papers (2024-07-30T05:28:10Z) - Universal Approximation and the Topological Neural Network [0.0]
A topological neural network (TNN) takes data from a Tychonoff topological space instead of the usual finite dimensional space.
A distributional neural network (DNN) that takes Borel measures as data is also introduced.
arXiv Detail & Related papers (2023-05-26T05:28:10Z) - On the limits of neural network explainability via descrambling [2.5554069583567487]
We show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights.
We show that in typical deep learning contexts these descramblers take diverse and interesting forms.
arXiv Detail & Related papers (2023-01-18T23:16:53Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Continuous Generative Neural Networks [0.966840768820136]
We study Continuous Generative Neural Networks (CGNNs) in the continuous setting.
The architecture is inspired by DCGAN, with one fully connected layer, several convolutional layers and nonlinear activation functions.
We present conditions on the convolutional filters and on the nonlinearity that guarantee that a CGNN is injective.
arXiv Detail & Related papers (2022-05-29T11:06:29Z) - SymNMF-Net for The Symmetric NMF Problem [62.44067422984995]
We propose a neural network called SymNMF-Net for the Symmetric NMF problem.
We show that the inference of each block corresponds to a single iteration of the optimization.
Empirical results on real-world datasets demonstrate the superiority of our SymNMF-Net.
arXiv Detail & Related papers (2022-05-26T08:17:39Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Universal approximation property of invertible neural networks [76.95927093274392]
Invertible neural networks (INNs) are neural network architectures with invertibility by design.
Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning.
arXiv Detail & Related papers (2022-04-15T10:45:26Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.