FlexConv: Continuous Kernel Convolutions with Differentiable Kernel
Sizes
- URL: http://arxiv.org/abs/2110.08059v2
- Date: Mon, 18 Oct 2021 08:53:39 GMT
- Title: FlexConv: Continuous Kernel Convolutions with Differentiable Kernel
Sizes
- Authors: David W. Romero, Robert-Jan Bruintjes, Jakub M. Tomczak, Erik J.
Bekkers, Mark Hoogendoorn, Jan C. van Gemert
- Abstract summary: Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice.
We propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost.
- Score: 34.90912459206022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When designing Convolutional Neural Networks (CNNs), one must select the size
of the convolutional kernels before training. Recent works show CNNs benefit
from different kernel sizes at different layers, but exploring all possible
combinations is unfeasible in practice. A more efficient approach is to learn
the kernel size during training. However, existing works that learn the kernel
size have a limited bandwidth. These approaches scale kernels by dilation, and
thus the detail they can describe is limited. In this work, we propose
FlexConv, a novel convolutional operation with which high bandwidth
convolutional kernels of learnable kernel size can be learned at a fixed
parameter cost. FlexNets model long-term dependencies without the use of
pooling, achieve state-of-the-art performance on several sequential datasets,
outperform recent works with learned kernel sizes, and are competitive with
much deeper ResNets on image benchmark datasets. Additionally, FlexNets can be
deployed at higher resolutions than those seen during training. To avoid
aliasing, we propose a novel kernel parameterization with which the frequency
of the kernels can be analytically controlled. Our novel kernel
parameterization shows higher descriptive power and faster convergence speed
than existing parameterizations. This leads to important improvements in
classification accuracy.
Related papers
- Kernel-U-Net: Multivariate Time Series Forecasting using Custom Kernels [1.8816077341295625]
We introduce Kernel-U-Net, a flexible and kernel-customizable U-shape neural network architecture.
Specifically, Kernel-U-Net separates the procedure of partitioning input time series into patches from kernel manipulation.
Our method offers two primary advantages: 1) Flexibility in kernel customization to adapt to specific datasets; and 2) Enhanced computational efficiency, with the complexity of the Transformer layer reduced to linear.
arXiv Detail & Related papers (2024-01-03T00:49:51Z) - Amortized Inference for Gaussian Process Hyperparameters of Structured
Kernels [5.1672267755831705]
Amortizing parameter inference over different datasets is a promising approach to dramatically speed up training time.
We propose amortizing kernel parameter inference over a complete kernel-structure-family rather than a fixed kernel structure.
We show drastically reduced inference time combined with competitive test performance for a large set of kernels and datasets.
arXiv Detail & Related papers (2023-06-16T13:02:57Z) - Canvas: End-to-End Kernel Architecture Search in Neural Networks [1.1612831901050744]
We build an end-to-end framework, Canvas, to find high-quality kernels as convolution replacements.
We show that Canvas average 1.5x speedups compared to the previous state-of-the-art with acceptable accuracy loss and search efficiency.
arXiv Detail & Related papers (2023-04-16T10:05:42Z) - Drastically Reducing the Number of Trainable Parameters in Deep CNNs by
Inter-layer Kernel-sharing [0.4129225533930965]
Deep convolutional neural networks (DCNNs) have become the state-of-the-art (SOTA) approach for many computer vision tasks.
Here, we suggest a simple way to reduce the number of trainable parameters and thus the memory footprint: sharing kernels between multiple convolutional layers.
arXiv Detail & Related papers (2022-10-23T18:14:30Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Hyper-Convolutions via Implicit Kernels for Medical Imaging [18.98078260974008]
We present the textithyper-convolution, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates.
We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise.
arXiv Detail & Related papers (2022-02-06T03:56:19Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.