Related papers: Dilated Convolution with Learnable Spacings: beyond bilinear interpolation

Dilated Convolution with Learnable Spacings: beyond bilinear interpolation

URL: http://arxiv.org/abs/2306.00817v2
Date: Fri, 22 Sep 2023 20:08:13 GMT
Title: Dilated Convolution with Learnable Spacings: beyond bilinear interpolation
Authors: Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timoth\'ee Masquelier
Abstract summary: Dilated Convolution with Learnable Spacings is a proposed variation of the dilated convolution. Non-integer positions are handled via gradients. The method code is based on PyTorch.
Score: 10.89964981012741
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable. Non-integer positions are handled via interpolation. Thanks to this trick, positions have well-defined gradients. The original DCLS used bilinear interpolation, and thus only considered the four nearest pixels. Yet here we show that longer range interpolations, and in particular a Gaussian interpolation, allow improving performance on ImageNet1k classification on two state-of-the-art convolutional architectures (ConvNeXt and Conv\-Former), without increasing the number of parameters. The method code is based on PyTorch and is available at https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch

Related papers

LDConv: Linear deformable convolution for improving convolutional neural networks [18.814748446649627]
Linear Deformable Convolution (LDConv) is a plug-and-play convolutional operation that can replace the convolutional operation to improve network performance. LDConv corrects the growth trend of the number of parameters for standard convolution and Deformable Conv to a linear growth.
arXiv Detail & Related papers (2023-11-20T07:54:54Z)
Audio classification with Dilated Convolution with Learnable Spacings [10.89964981012741]
Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation. Here we show that DCLS is also useful for audio tagging using the AudioSet classification benchmark.
arXiv Detail & Related papers (2023-09-25T09:09:54Z)
Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations. We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions. Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z)
Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs. They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion. For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z)
Dilated convolution with learnable spacings [6.6389732792316005]
CNNs need receptive fields (RF) to compete with visual transformers. RFs can simply be enlarged by increasing the convolution kernel sizes. The number of trainable parameters, which scales quadratically with the kernel's size in the 2D case, rapidly becomes prohibitive. This paper presents a new method to increase the RF size without increasing the number of parameters.
arXiv Detail & Related papers (2021-12-07T14:54:24Z)
Interpolating Points on a Non-Uniform Grid using a Mixture of Gaussians [0.0]
We propose an approach to perform non-uniform image based on a Gaussian Mixture Model. Traditional image methods assume that the coordinates you want to interpolate from, are positioned on a uniform grid.
arXiv Detail & Related papers (2020-12-24T13:59:39Z)
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer [76.44375136492827]
Convolutional Neural Networks (CNNs) are often scale-sensitive. We bridge this regret by exploiting multi-scale features in a finer granularity. The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates.
arXiv Detail & Related papers (2020-07-13T05:14:11Z)
DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel. We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z)
Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution [67.83074893311218]
Kernel-based methods predict pixels with a single convolution process that convolves source frames with spatially adaptive local kernels. We propose enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases. We show that our method performs favorably against the state-of-the-art methods across a broad range of datasets.
arXiv Detail & Related papers (2020-06-15T01:10:59Z)
Region adaptive graph fourier transform for 3d point clouds [51.193111325231165]
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. The RA-GFT achieves better complexity-performance trade-offs than previous approaches.
arXiv Detail & Related papers (2020-03-04T02:47:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.