What Can Be Learnt With Wide Convolutional Neural Networks?
- URL: http://arxiv.org/abs/2208.01003v5
- Date: Wed, 31 May 2023 15:39:31 GMT
- Title: What Can Be Learnt With Wide Convolutional Neural Networks?
- Authors: Francesco Cagnetta, Alessandro Favero and Matthieu Wyart
- Abstract summary: We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
- Score: 69.55323565255631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how convolutional neural networks (CNNs) can efficiently learn
high-dimensional functions remains a fundamental challenge. A popular belief is
that these models harness the local and hierarchical structure of natural data
such as images. Yet, we lack a quantitative understanding of how such structure
affects performance, e.g., the rate of decay of the generalisation error with
the number of training samples. In this paper, we study infinitely-wide deep
CNNs in the kernel regime. First, we show that the spectrum of the
corresponding kernel inherits the hierarchical structure of the network, and we
characterise its asymptotics. Then, we use this result together with
generalisation bounds to prove that deep CNNs adapt to the spatial scale of the
target function. In particular, we find that if the target function depends on
low-dimensional subsets of adjacent input variables, then the decay of the
error is controlled by the effective dimensionality of these subsets.
Conversely, if the target function depends on the full set of input variables,
then the error decay is controlled by the input dimension. We conclude by
computing the generalisation error of a deep CNN trained on the output of
another deep CNN with randomly-initialised parameters. Interestingly, we find
that, despite their hierarchical structure, the functions generated by
infinitely-wide deep CNNs are too rich to be efficiently learnable in high
dimension.
Related papers
- Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map [4.776836972093627]
We present a method for analysing feature learning by decomposing deep neural networks (DNNs)
We find that DNNs converge to a minimal feature (MF) regime dominated by a number of eigenfunctions equal to the number of classes.
We recast the phenomenon of neural collapse into a kernel picture which can be extended to broader tasks such as regression.
arXiv Detail & Related papers (2024-10-05T18:53:48Z) - Average gradient outer product as a mechanism for deep neural collapse [26.939895223897572]
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs)
In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP)
We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for neural networks trained in the feature learning regime.
arXiv Detail & Related papers (2024-02-21T11:40:27Z) - Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Universal Approximation Property of Fully Convolutional Neural Networks
with Zero Padding [10.295288663157393]
CNNs function as tensor-to-tensor mappings, preserving the spatial structure of input data.
We show that CNNs can approximate arbitrary continuous functions in cases where both the input and output values exhibit the same spatial shape.
We also verify that deep, narrow CNNs possess the UAP as tensor-to-tensor functions.
arXiv Detail & Related papers (2022-11-18T02:04:16Z) - Towards a General Purpose CNN for Long Range Dependencies in
$\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes.
We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$)
Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.