Do ideas have shape? Idea registration as the continuous limit of
artificial neural networks
- URL: http://arxiv.org/abs/2008.03920v3
- Date: Thu, 27 Oct 2022 18:01:18 GMT
- Title: Do ideas have shape? Idea registration as the continuous limit of
artificial neural networks
- Authors: Houman Owhadi
- Abstract summary: We show that ResNets converge, in the infinite depth limit, to a generalization of image registration variational algorithms.
We present the first rigorous proof of convergence of ResNets with trained weights and biases towards a Hamiltonian dynamics driven flow.
- Score: 0.609170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a GP generalization of ResNets (including ResNets as a
particular case). We show that ResNets (and their GP generalization) converge,
in the infinite depth limit, to a generalization of image registration
variational algorithms. Whereas computational anatomy aligns images via warping
of the material space, this generalization aligns ideas (or abstract shapes as
in Plato's theory of forms) via the warping of the RKHS of functions mapping
the input space to the output space. While the Hamiltonian interpretation of
ResNets is not new, it was based on an Ansatz. We do not rely on this Ansatz
and present the first rigorous proof of convergence of ResNets with trained
weights and biases towards a Hamiltonian dynamics driven flow. Our constructive
proof reveals several remarkable properties of ResNets and their GP
generalization. ResNets regressors are kernel regressors with data-dependent
warping kernels. Minimizers of $L_2$ regularized ResNets satisfy a discrete
least action principle implying the near preservation of the norm of weights
and biases across layers. The trained weights of ResNets with $L^2$
regularization can be identified by solving an autonomous Hamiltonian system.
The trained ResNet parameters are unique up to the initial momentum whose
representation is generally sparse. The kernel regularization strategy provides
a provably robust alternative to Dropout for ANNs. We introduce a functional
generalization of GPs leading to error estimates for ResNets. We identify the
(EPDiff) mean fields limit of trained ResNet parameters. We show that the
composition of warping regression blocks with reduced equivariant multichannel
kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and
groups of transformations.
Related papers
- Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$)
In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs)
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Global convergence of ResNets: From finite to infinite width using
linear parameterization [0.0]
We study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear.
In this limit, we prove a local Polyak-Lojasiewicz inequality, retrieving the lazy regime.
Our analysis leads to a practical and quantified recipe.
arXiv Detail & Related papers (2021-12-10T13:38:08Z) - Approximation properties of Residual Neural Networks for Kolmogorov PDEs [0.0]
We show that ResNets are able to approximate Kolmogorov partial differential equations with constant diffusion and possibly nonlinear gradient coefficients.
In contrast to FNNs, the Euler-Maruyama approximation structure of ResNets simplifies the construction of the approximating ResNets substantially.
arXiv Detail & Related papers (2021-10-30T09:28:49Z) - Edge Rewiring Goes Neural: Boosting Network Resilience via Policy
Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks.
We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z) - The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
Limit at Initialization [18.613475245655806]
We study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant.
Using Monte Carlo simulations, we demonstrate that even basic properties of standard ResNet architectures are poorly captured by the Gaussian limit.
arXiv Detail & Related papers (2021-06-07T23:47:37Z) - Momentum Residual Neural Networks [22.32840998053339]
We propose to change the forward rule of a ResNet by adding a momentum term.
MomentumNets can be used as a drop-in replacement for any existing ResNet block.
We show that MomentumNets have the same accuracy as ResNets, while having a much smaller memory footprint.
arXiv Detail & Related papers (2021-02-15T22:24:52Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.