The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning
- URL: http://arxiv.org/abs/2410.08451v1
- Date: Fri, 11 Oct 2024 01:43:14 GMT
- Title: The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning
- Authors: Michael H. Freedman,
- Abstract summary: Kolmogorov and Arnold laid the foundations for the modern theory of Neural Networks (NNs)
Minor concentration amounts to sparsity for higher exterior powers of the Jacobians.
We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - Data Representations' Study of Latent Image Manifolds [5.801621787540268]
We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers.
We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network.
arXiv Detail & Related papers (2023-05-31T10:49:16Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Unwrapping All ReLU Networks [1.370633147306388]
Deep ReLU Networks can be decomposed into a collection of linear models.
We extend this decomposition to Graph Neural networks and tensor convolutional networks.
We show how this model leads to computing cheap and exact SHAP values.
arXiv Detail & Related papers (2023-05-16T13:30:15Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - A singular Riemannian geometry approach to Deep Neural Networks II.
Reconstruction of 1-D equivalence classes [78.120734120667]
We build the preimage of a point in the output manifold in the input space.
We focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces.
arXiv Detail & Related papers (2021-12-17T11:47:45Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Towards Lower Bounds on the Depth of ReLU Neural Networks [7.355977594790584]
We investigate whether the class of exactly representable functions strictly increases by adding more layers.
We settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative.
We present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
arXiv Detail & Related papers (2021-05-31T09:49:14Z) - Hierarchical nucleation in deep neural networks [67.85373725288136]
We study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs.
We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification.
In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts.
arXiv Detail & Related papers (2020-07-07T14:42:18Z) - A Note on the Global Convergence of Multilayer Neural Networks in the
Mean Field Regime [9.89901717499058]
We introduce a rigorous framework to describe the mean field limit of gradient-based learning dynamics of multilayer neural networks.
We prove a global convergence guarantee for multilayer networks of any depths.
arXiv Detail & Related papers (2020-06-16T17:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.