Inception Convolution with Efficient Dilation Search
- URL: http://arxiv.org/abs/2012.13587v1
- Date: Fri, 25 Dec 2020 14:58:35 GMT
- Title: Inception Convolution with Efficient Dilation Search
- Authors: Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli
Ouyang, Dong Xu
- Abstract summary: Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
- Score: 121.41030859447487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dilation convolution is a critical mutant of standard convolution neural
network to control effective receptive fields and handle large scale variance
of objects without introducing additional computation. However, fitting the
effective reception field to data with dilated convolution is less discussed in
the literature. To fully explore its potentials, we proposed a new mutant of
dilated convolution, namely inception (dilated) convolution where the
convolutions have independent dilation among different axes, channels and
layers. To explore a practical method for fitting the complex inception
convolution to the data, a simple while effective dilation search
algorithm(EDO) based on statistical optimization is developed. The search
method operates in a zero-cost manner which is extremely fast to apply on large
scale datasets. Empirical results reveal that our method obtains consistent
performance gains in an extensive range of benchmarks. For instance, by simply
replace the 3 x 3 standard convolutions in ResNet-50 backbone with inception
convolution, we improve the mAP of Faster-RCNN on MS-COCO from 36.4% to 39.2%.
Furthermore, using the same replacement in ResNet-101 backbone, we achieve a
huge improvement over AP score from 60.2% to 68.5% on COCO val2017 for the
bottom up human pose estimation.
Related papers
- D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement [37.78880948551719]
D-FINE is a powerful real-time object detector that achieves outstanding localization precision.
D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal localization Self-Distillation (GO-LSD)
When pretrained on Objects365, D-FINE-L / X attains 57.1% / 59.3% AP, surpassing all existing real-time detectors.
arXiv Detail & Related papers (2024-10-17T17:57:01Z) - E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network [47.77270862087191]
We propose E3-Net to achieve equivariance for normal estimation.
We introduce an efficient random frame method, which significantly reduces the training resources required for this task to just 1/8 of previous work.
Our method achieves superior results on both synthetic and real-world datasets, and outperforms current state-of-the-art techniques by a substantial margin.
arXiv Detail & Related papers (2024-06-01T07:53:36Z) - DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image
Segmentation with Depthwise Deformable Convolution [26.746489317083352]
We introduce 3D DeformUX-Net, a pioneering volumetric CNN model.
We revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency.
Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models.
arXiv Detail & Related papers (2023-09-30T00:33:41Z) - CNN-transformer mixed model for object detection [3.5897534810405403]
In this paper, I propose a convolutional module with a transformer.
It aims to improve the recognition accuracy of the model by fusing the detailed features extracted by CNN with the global features extracted by a transformer.
After 100 rounds of training on the Pascal VOC dataset, the accuracy of the results reached 81%, which is 4.6 better than the faster RCNN[4] using resnet101[5] as the backbone.
arXiv Detail & Related papers (2022-12-13T16:35:35Z) - Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Involution: Inverting the Inherence of Convolution for Visual
Recognition [72.88582255910835]
We present a novel atomic operation for deep neural networks by inverting the principles of convolution, coined as involution.
The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition.
Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely.
arXiv Detail & Related papers (2021-03-10T18:40:46Z) - Attention-based Convolutional Autoencoders for 3D-Variational Data
Assimilation [11.143409762586638]
We propose a new 'Bi-Reduced Space' approach to solving 3D Variational Data Assimilation using Convolutional Autoencoders.
We prove that our approach has the same solution as previous methods but has significantly lower computational complexity.
arXiv Detail & Related papers (2021-01-06T16:23:58Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.