Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature
Learning
- URL: http://arxiv.org/abs/2402.08010v1
- Date: Mon, 12 Feb 2024 19:18:50 GMT
- Title: Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature
Learning
- Authors: Yuxiao Wen, Arthur Jacot
- Abstract summary: We describe the emergence of a Convolution Bottleneck structure in CNNs.
We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck.
We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights.
- Score: 12.351756386062291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe the emergence of a Convolution Bottleneck (CBN) structure in
CNNs, where the network uses its first few layers to transform the input
representation into a representation that is supported only along a few
frequencies and channels, before using the last few layers to map back to the
outputs. We define the CBN rank, which describes the number and type of
frequencies that are kept inside the bottleneck, and partially prove that the
parameter norm required to represent a function $f$ scales as depth times the
CBN rank $f$. We also show that the parameter norm depends at next order on the
regularity of $f$. We show that any network with almost optimal parameter norm
will exhibit a CBN structure in both the weights and - under the assumption
that the network is stable under large learning rate - the activations, which
motivates the common practice of down-sampling; and we verify that the CBN
results still hold with down-sampling. Finally we use the CBN structure to
interpret the functions learned by CNNs on a number of tasks.
Related papers
- Chebyshev Feature Neural Network for Accurate Function Approximation [3.8769921482808116]
We present a new Deep Neural Network architecture capable of approximating functions up to machine accuracy.
Termed Chebyshev Feature Neural Network (CFNN), the new structure employs Chebyshev functions with learnable frequencies as the first hidden layer.
arXiv Detail & Related papers (2024-09-27T20:41:17Z) - Multiscale Hodge Scattering Networks for Data Analysis [0.5243460995467895]
We propose new scattering networks for signals measured on simplicial complexes, which we call emphMultiscale Hodge Scattering Networks (MHSNs)
Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $kappa$-GHWT and $kappa$-HGLET.
arXiv Detail & Related papers (2023-11-17T01:30:43Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Towards a General Purpose CNN for Long Range Dependencies in
$\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes.
We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$)
Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z) - Quantized convolutional neural networks through the lens of partial
differential equations [6.88204255655161]
Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs.
In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis.
arXiv Detail & Related papers (2021-08-31T22:18:52Z) - Sandwich Batch Normalization [96.2529041037824]
We present Sandwich Batch Normalization (SaBN), an easy improvement of Batch Normalization (BN) with only a few lines of code changes.
Our SaBN factorizes the BN affine layer into one shared sandwich affine layer, cascaded by several parallel independent affine layers.
We demonstrate the prevailing effectiveness of SaBN as a drop-in replacement in four tasks.
arXiv Detail & Related papers (2021-02-22T22:09:43Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - How Does BN Increase Collapsed Neural Network Filters? [34.886702335022015]
Filter collapse is common in deep neural networks (DNNs) with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU)
We propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training.
arXiv Detail & Related papers (2020-01-30T09:00:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.