Related papers: Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

URL: http://arxiv.org/abs/2309.05968v1
Date: Tue, 12 Sep 2023 05:36:08 GMT
Title: Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity
Authors: Ng Shyh-Chang, A-Li Luo, Bo Qiu
Abstract summary: We show that for every stably converged NN of continuous activation functions, its weight matrix encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity. This Layer Matrix Decomposition (LMD) further suggests a close relationship between eigen-decomposition of NN layers and the latest advances in conceptualizations of Hopfield networks and Transformer NN models.
Score: 1.2891210250935148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for truncated singular value decomposition of the weight matrix for every NN layer, we can illuminate the nature of the latent space manifold of the training dataset encoded and represented by every NN layer, and the geometric nature of the mathematical operations performed by each NN layer. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity, and that the two are complementary. This Layer Matrix Decomposition (LMD) further suggests a close relationship between eigen-decomposition of NN layers and the latest advances in conceptualizations of Hopfield networks and Transformer NN models.

Related papers

Invariant deep neural networks under the finite group for solving partial differential equations [1.4916944282865694]
We design a symmetry-enhanced deep neural network (sDNN) which makes the architecture of neural networks invariant under the finite group. Numerical results show that the sDNN has strong predicted abilities in and beyond the sampling domain.
arXiv Detail & Related papers (2024-07-30T05:28:10Z)
Universal Approximation and the Topological Neural Network [0.0]
A topological neural network (TNN) takes data from a Tychonoff topological space instead of the usual finite dimensional space. A distributional neural network (DNN) that takes Borel measures as data is also introduced.
arXiv Detail & Related papers (2023-05-26T05:28:10Z)
On the limits of neural network explainability via descrambling [2.5554069583567487]
We show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights. We show that in typical deep learning contexts these descramblers take diverse and interesting forms.
arXiv Detail & Related papers (2023-01-18T23:16:53Z)
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z)
SymNMF-Net for The Symmetric NMF Problem [62.44067422984995]
We propose a neural network called SymNMF-Net for the Symmetric NMF problem. We show that the inference of each block corresponds to a single iteration of the optimization. Empirical results on real-world datasets demonstrate the superiority of our SymNMF-Net.
arXiv Detail & Related papers (2022-05-26T08:17:39Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Universal approximation property of invertible neural networks [76.95927093274392]
Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning.
arXiv Detail & Related papers (2022-04-15T10:45:26Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Universal Approximation Power of Deep Residual Neural Networks via Nonlinear Control Theory [9.210074587720172]
We explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control. Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation.
arXiv Detail & Related papers (2020-07-12T14:53:30Z)
A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.