Learning strides in convolutional neural networks
- URL: http://arxiv.org/abs/2202.01653v1
- Date: Thu, 3 Feb 2022 16:03:36 GMT
- Title: Learning strides in convolutional neural networks
- Authors: Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour
- Abstract summary: This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
- Score: 34.20666933112202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks typically contain several downsampling
operators, such as strided convolutions or pooling layers, that progressively
reduce the resolution of intermediate representations. This provides some
shift-invariance while reducing the computational complexity of the whole
architecture. A critical hyperparameter of such layers is their stride: the
integer factor of downsampling. As strides are not differentiable, finding the
best configuration either requires cross-validation or discrete optimization
(e.g. architecture search), which rapidly become prohibitive as the search
space grows exponentially with the number of downsampling layers. Hence,
exploring this search space by gradient descent would allow finding better
configurations at a lower computational cost. This work introduces DiffStride,
the first downsampling layer with learnable strides. Our layer learns the size
of a cropping mask in the Fourier domain, that effectively performs resizing in
a differentiable way. Experiments on audio and image classification show the
generality and effectiveness of our solution: we use DiffStride as a drop-in
replacement to standard downsampling layers and outperform them. In particular,
we show that introducing our layer into a ResNet-18 architecture allows keeping
consistent high performance on CIFAR10, CIFAR100 and ImageNet even when
training starts from poor random stride configurations. Moreover, formulating
strides as learnable variables allows us to introduce a regularization term
that controls the computational complexity of the architecture. We show how
this regularization allows trading off accuracy for efficiency on ImageNet.
Related papers
- On the effectiveness of partial variance reduction in federated learning
with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm.
Motivated by this, we propose to correct model by variance reduction only on the final layers.
We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - CUF: Continuous Upsampling Filters [25.584630142930123]
In this paper, we consider one of the most important operations in image processing: upsampling.
We propose to parameterize upsampling kernels as neural fields.
This parameterization leads to a compact architecture that obtains a 40-fold reduction in the number of parameters when compared with competing arbitrary-scale super-resolution architectures.
arXiv Detail & Related papers (2022-10-13T12:45:51Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - Learning with convolution and pooling operations in kernel methods [8.528384027684192]
Recent empirical work has shown that hierarchical convolutional kernels improve the performance of kernel methods in image classification tasks.
We study the precise interplay between approximation and generalization in convolutional architectures.
Our results quantify how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.
arXiv Detail & Related papers (2021-11-16T09:00:44Z) - SDWNet: A Straight Dilated Network with Wavelet Transformation for Image
Deblurring [23.86692375792203]
Image deblurring is a computer vision problem that aims to recover a sharp image from a blurred image.
Our model uses dilated convolution to enable the obtainment of the large receptive field with high spatial resolution.
We propose a novel module using the wavelet transform, which effectively helps the network to recover clear high-frequency texture details.
arXiv Detail & Related papers (2021-10-12T07:58:10Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.