Dilated convolution with learnable spacings
- URL: http://arxiv.org/abs/2112.03740v4
- Date: Thu, 11 May 2023 11:13:39 GMT
- Title: Dilated convolution with learnable spacings
- Authors: Ismail Khalfaoui-Hassani, Thomas Pellegrini and Timoth\'ee Masquelier
- Abstract summary: CNNs need receptive fields (RF) to compete with visual transformers.
RFs can simply be enlarged by increasing the convolution kernel sizes.
The number of trainable parameters, which scales quadratically with the kernel's size in the 2D case, rapidly becomes prohibitive.
This paper presents a new method to increase the RF size without increasing the number of parameters.
- Score: 6.6389732792316005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works indicate that convolutional neural networks (CNN) need large
receptive fields (RF) to compete with visual transformers and their attention
mechanism. In CNNs, RFs can simply be enlarged by increasing the convolution
kernel sizes. Yet the number of trainable parameters, which scales
quadratically with the kernel's size in the 2D case, rapidly becomes
prohibitive, and the training is notoriously difficult. This paper presents a
new method to increase the RF size without increasing the number of parameters.
The dilated convolution (DC) has already been proposed for the same purpose. DC
can be seen as a convolution with a kernel that contains only a few non-zero
elements placed on a regular grid. Here we present a new version of the DC in
which the spacings between the non-zero elements, or equivalently their
positions, are no longer fixed but learnable via backpropagation thanks to an
interpolation technique. We call this method "Dilated Convolution with
Learnable Spacings" (DCLS) and generalize it to the n-dimensional convolution
case. However, our main focus here will be on the 2D case. We first tried our
approach on ResNet50: we drop-in replaced the standard convolutions with DCLS
ones, which increased the accuracy of ImageNet1k classification at
iso-parameters, but at the expense of the throughput. Next, we used the recent
ConvNeXt state-of-the-art convolutional architecture and drop-in replaced the
depthwise convolutions with DCLS ones. This not only increased the accuracy of
ImageNet1k classification but also of typical downstream and robustness tasks,
again at iso-parameters but this time with negligible cost on throughput, as
ConvNeXt uses separable convolutions. Conversely, classic DC led to poor
performance with both ResNet50 and ConvNeXt. The code of the method is
available at:
https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch.
Related papers
- LDConv: Linear deformable convolution for improving convolutional neural networks [18.814748446649627]
Linear Deformable Convolution (LDConv) is a plug-and-play convolutional operation that can replace the convolutional operation to improve network performance.
LDConv corrects the growth trend of the number of parameters for standard convolution and Deformable Conv to a linear growth.
arXiv Detail & Related papers (2023-11-20T07:54:54Z) - Audio classification with Dilated Convolution with Learnable Spacings [10.89964981012741]
Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.
Here we show that DCLS is also useful for audio tagging using the AudioSet classification benchmark.
arXiv Detail & Related papers (2023-09-25T09:09:54Z) - Dilated Convolution with Learnable Spacings: beyond bilinear
interpolation [10.89964981012741]
Dilated Convolution with Learnable Spacings is a proposed variation of the dilated convolution.
Non-integer positions are handled via gradients.
The method code is based on PyTorch.
arXiv Detail & Related papers (2023-06-01T15:42:08Z) - GMConv: Modulating Effective Receptive Fields for Convolutional Kernels [52.50351140755224]
In convolutional neural networks, the convolutions are performed using a square kernel with a fixed N $times$ N receptive field (RF)
Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work.
Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation.
arXiv Detail & Related papers (2023-02-09T10:17:17Z) - An Improved Normed-Deformable Convolution for Crowd Counting [70.02434289611566]
Deformable convolution is proposed to exploit the scale-adaptive capabilities for CNN features in the heads.
An improved Normed-Deformable Convolution (textiti.e.,NDConv) is proposed in this paper.
Our method outperforms state-of-the-art methods on ShanghaiTech A, ShanghaiTech B, UCF_QNRF, and UCF_CC_50 dataset.
arXiv Detail & Related papers (2022-06-16T10:56:26Z) - Adaptive Split-Fusion Transformer [90.04885335911729]
We propose an Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights.
Experiments on standard benchmarks, such as ImageNet-1K, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy.
arXiv Detail & Related papers (2022-04-26T10:00:28Z) - Hyper-Convolutions via Implicit Kernels for Medical Imaging [18.98078260974008]
We present the textithyper-convolution, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates.
We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise.
arXiv Detail & Related papers (2022-02-06T03:56:19Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.