Rethinking Depthwise Separable Convolutions: How Intra-Kernel
Correlations Lead to Improved MobileNets
- URL: http://arxiv.org/abs/2003.13549v3
- Date: Mon, 13 Jul 2020 14:57:16 GMT
- Title: Rethinking Depthwise Separable Convolutions: How Intra-Kernel
Correlations Lead to Improved MobileNets
- Authors: Daniel Haase and Manuel Amthor
- Abstract summary: We introduce blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs.
They are motivated by quantitative analyses of kernel properties from trained models.
Our approach provides a thorough theoretical derivation, interpretation, and justification for the application of depthwise separable convolutions.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce blueprint separable convolutions (BSConv) as highly efficient
building blocks for CNNs. They are motivated by quantitative analyses of kernel
properties from trained models, which show the dominance of correlations along
the depth axis. Based on our findings, we formulate a theoretical foundation
from which we derive efficient implementations using only standard layers.
Moreover, our approach provides a thorough theoretical derivation,
interpretation, and justification for the application of depthwise separable
convolutions (DSCs) in general, which have become the basis of many modern
network architectures. Ultimately, we reveal that DSC-based architectures such
as MobileNets implicitly rely on cross-kernel correlations, while our BSConv
formulation is based on intra-kernel correlations and thus allows for a more
efficient separation of regular convolutions. Extensive experiments on
large-scale and fine-grained classification datasets show that BSConvs clearly
and consistently improve MobileNets and other DSC-based architectures without
introducing any further complexity. For fine-grained datasets, we achieve an
improvement of up to 13.7 percentage points. In addition, if used as drop-in
replacement for standard architectures such as ResNets, BSConv variants also
outperform their vanilla counterparts by up to 9.5 percentage points on
ImageNet. Code and models are available under
https://github.com/zeiss-microscopy/BSConv.
Related papers
- DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [30.412909498409192]
This paper revives Densely Connected Convolutional Networks (DenseNets)
We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities.
We provide empirical analyses that uncover the merits of the concatenation over additive shortcuts, steering a renewed preference towards DenseNet-style designs.
arXiv Detail & Related papers (2024-03-28T17:12:39Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Learning Target-aware Representation for Visual Tracking via Informative
Interactions [49.552877881662475]
We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking.
The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer.
arXiv Detail & Related papers (2022-01-07T16:22:27Z) - Rapid training of deep neural networks without skip connections or
normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data.
We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z) - Single-stream CNN with Learnable Architecture for Multi-source Remote
Sensing Data [16.810239678639288]
We propose an efficient framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification.
The proposed method can theoretically adjust any modern CNN models to any multi-source remote sensing data set.
Experimental results demonstrate the effectiveness of the proposed single-stream CNNs.
arXiv Detail & Related papers (2021-09-13T16:10:41Z) - Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization [16.85167651136133]
We take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models.
We show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
arXiv Detail & Related papers (2021-02-02T18:40:26Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.