The Kolmogorov-Arnold representation theorem revisited
- URL: http://arxiv.org/abs/2007.15884v2
- Date: Sat, 2 Jan 2021 16:42:55 GMT
- Title: The Kolmogorov-Arnold representation theorem revisited
- Authors: Johannes Schmidt-Hieber
- Abstract summary: There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks.
We derive modifications of the Kolmogorov-Arnold representation that transfer smoothness properties of the represented function to the outer function and can be well approximated by ReLU networks.
It appears that instead of two hidden layers, a more natural interpretation of the Kolmogorov-Arnold representation is that of a deep neural network where most of the layers are required to approximate the interior function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a longstanding debate whether the Kolmogorov-Arnold representation
theorem can explain the use of more than one hidden layer in neural networks.
The Kolmogorov-Arnold representation decomposes a multivariate function into an
interior and an outer function and therefore has indeed a similar structure as
a neural network with two hidden layers. But there are distinctive differences.
One of the main obstacles is that the outer function depends on the represented
function and can be wildly varying even if the represented function is smooth.
We derive modifications of the Kolmogorov-Arnold representation that transfer
smoothness properties of the represented function to the outer function and can
be well approximated by ReLU networks. It appears that instead of two hidden
layers, a more natural interpretation of the Kolmogorov-Arnold representation
is that of a deep neural network where most of the layers are required to
approximate the interior function.
Related papers
- The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning [0.0]
Kolmogorov and Arnold laid the foundations for the modern theory of Neural Networks (NNs)
Minor concentration amounts to sparsity for higher exterior powers of the Jacobians.
We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs.
arXiv Detail & Related papers (2024-10-11T01:43:14Z) - Functional Diffusion [55.251174506648454]
We propose a new class of generative diffusion models, called functional diffusion.
functional diffusion can be seen as an extension of classical diffusion models to an infinite-dimensional domain.
We show generative results on complicated signed distance functions and deformation functions defined on 3D surfaces.
arXiv Detail & Related papers (2023-11-26T21:35:34Z) - Going Beyond Neural Network Feature Similarity: The Network Feature
Complexity and Its Interpretation Using Category Theory [64.06519549649495]
We provide the definition of what we call functionally equivalent features.
These features produce equivalent output under certain transformations.
We propose an efficient algorithm named Iterative Feature Merging.
arXiv Detail & Related papers (2023-10-10T16:27:12Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Brain Cortical Functional Gradients Predict Cortical Folding Patterns
via Attention Mesh Convolution [51.333918985340425]
We develop a novel attention mesh convolution model to predict cortical gyro-sulcal segmentation maps on individual brains.
Experiments show that the prediction performance via our model outperforms other state-of-the-art models.
arXiv Detail & Related papers (2022-05-21T14:08:53Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Towards Lower Bounds on the Depth of ReLU Neural Networks [7.355977594790584]
We investigate whether the class of exactly representable functions strictly increases by adding more layers.
We settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative.
We present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
arXiv Detail & Related papers (2021-05-31T09:49:14Z) - Representation formulas and pointwise properties for Barron functions [8.160343645537106]
We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space)
We show that functions whose singular set is fractal or curved cannot be represented by infinitely wide two-layer networks with finite path-norm.
This result suggests that two-layer neural networks may be able to approximate a greater variety of functions than commonly believed.
arXiv Detail & Related papers (2020-06-10T17:55:31Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z) - PDE constraints on smooth hierarchical functions computed by neural
networks [0.0]
An important problem in the theory of deep neural networks is expressivity.
We study real infinitely differentiable (smooth) hierarchical functions implemented by feedforward neural networks.
We conjecture that such PDE constraints, once accompanied by appropriate non-singularity conditions, guarantee that the smooth function under consideration can be represented by the network.
arXiv Detail & Related papers (2020-05-18T16:34:11Z) - Space of Functions Computed by Deep-Layered Machines [74.13735716675987]
We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits.
Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models.
arXiv Detail & Related papers (2020-04-19T18:31:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.