Related papers: Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

URL: http://arxiv.org/abs/2402.08010v1
Date: Mon, 12 Feb 2024 19:18:50 GMT
Title: Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Authors: Yuxiao Wen, Arthur Jacot
Abstract summary: We describe the emergence of a Convolution Bottleneck structure in CNNs. We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck. We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights.
Score: 12.351756386062291
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We describe the emergence of a Convolution Bottleneck (CBN) structure in CNNs, where the network uses its first few layers to transform the input representation into a representation that is supported only along a few frequencies and channels, before using the last few layers to map back to the outputs. We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck, and partially prove that the parameter norm required to represent a function $f$ scales as depth times the CBN rank $f$. We also show that the parameter norm depends at next order on the regularity of $f$. We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights and - under the assumption that the network is stable under large learning rate - the activations, which motivates the common practice of down-sampling; and we verify that the CBN results still hold with down-sampling. Finally we use the CBN structure to interpret the functions learned by CNNs on a number of tasks.

Related papers

Chebyshev Feature Neural Network for Accurate Function Approximation [3.8769921482808116]
We present a new Deep Neural Network architecture capable of approximating functions up to machine accuracy. Termed Chebyshev Feature Neural Network (CFNN), the new structure employs Chebyshev functions with learnable frequencies as the first hidden layer.
arXiv Detail & Related papers (2024-09-27T20:41:17Z)
Multiscale Hodge Scattering Networks for Data Analysis [0.5243460995467895]
We propose new scattering networks for signals measured on simplicial complexes, which we call emphMultiscale Hodge Scattering Networks (MHSNs) Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $kappa$-GHWT and $kappa$-HGLET.
arXiv Detail & Related papers (2023-11-17T01:30:43Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Towards a General Purpose CNN for Long Range Dependencies in $\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes. We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$) Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z)
Quantized convolutional neural networks through the lens of partial differential equations [6.88204255655161]
Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis.
arXiv Detail & Related papers (2021-08-31T22:18:52Z)
Sandwich Batch Normalization [96.2529041037824]
We present Sandwich Batch Normalization (SaBN), an easy improvement of Batch Normalization (BN) with only a few lines of code changes. Our SaBN factorizes the BN affine layer into one shared sandwich affine layer, cascaded by several parallel independent affine layers. We demonstrate the prevailing effectiveness of SaBN as a drop-in replacement in four tasks.
arXiv Detail & Related papers (2021-02-22T22:09:43Z)
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer. MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
How Does BN Increase Collapsed Neural Network Filters? [34.886702335022015]
Filter collapse is common in deep neural networks (DNNs) with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU) We propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training.
arXiv Detail & Related papers (2020-01-30T09:00:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.