Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks
- URL: http://arxiv.org/abs/2307.11807v1
- Date: Fri, 21 Jul 2023 17:22:04 GMT
- Title: Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks
- Authors: R. Aiudi, R. Pacelli, A. Vezzani, R. Burioni, P. Rotondo
- Abstract summary: Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts.
State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime.
We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature learning, or the ability of deep neural networks to automatically
learn relevant features from raw data, underlies their exceptional capability
to solve complex tasks. However, feature learning seems to be realized in
different ways in fully-connected (FC) or convolutional architectures (CNNs).
Empirical evidence shows that FC neural networks in the infinite-width limit
eventually outperform their finite-width counterparts. Since the kernel that
describes infinite-width networks does not evolve during training, whatever
form of feature learning occurs in deep FC architectures is not very helpful in
improving generalization. On the other hand, state-of-the-art architectures
with convolutional layers achieve optimal performances in the finite-width
regime, suggesting that an effective form of feature learning emerges in this
case. In this work, we present a simple theoretical framework that provides a
rationale for these differences, in one hidden layer networks. First, we show
that the generalization performance of a finite-width FC network can be
obtained by an infinite-width network, with a suitable choice of the Gaussian
priors. Second, we derive a finite-width effective action for an architecture
with one convolutional hidden layer and compare it with the result available
for FC networks. Remarkably, we identify a completely different form of kernel
renormalization: whereas the kernel of the FC architecture is just globally
renormalized by a single scalar parameter, the CNN kernel undergoes a local
renormalization, meaning that the network can select the local components that
will contribute to the final prediction in a data-dependent way. This finding
highlights a simple mechanism for feature learning that can take place in
overparametrized shallow CNNs, but not in shallow FC architectures or in
locally connected neural networks without weight sharing.
Related papers
- Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Disentangling Trainability and Generalization in Deep Neural Networks [45.15453323967438]
We analyze the spectrum of the Neural Tangent Kernel (NTK) for trainability and generalization across a range of networks.
We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance.
arXiv Detail & Related papers (2019-12-30T18:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.