Related papers: Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

URL: http://arxiv.org/abs/2006.11538v1
Date: Sat, 20 Jun 2020 10:19:29 GMT
Title: Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition
Authors: Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao
Abstract summary: This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. We present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing.
Score: 98.10703825716142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv

Related papers

Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation [22.19064240105095]
We propose a novel mechanism dubbed InterConv to perform interaction between object queries and image features via convolutional layers. With the proposed InterConv, we build Detection ConvNet (DECO), which is composed of a backbone and convolutional encoder-decoder architecture. Our DECO achieves competitive performance in terms of detection accuracy and running speed.
arXiv Detail & Related papers (2023-12-21T10:59:17Z)
PanDepth: Joint Panoptic Segmentation and Depth Completion [19.642115764441016]
We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps. Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame.
arXiv Detail & Related papers (2022-12-29T05:37:38Z)
kMaX-DeepLab: k-means Mask Transformer [41.104116145904825]
Most existing transformer-based vision models simply borrow the idea from NLP. Inspired by the traditional k-means clustering algorithm, we develop a k-means Mask Xformer for segmentation tasks. Our kMaX-DeepLab achieves a new state-of-the-art performance on COCO val set with 58.0% PQ, Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5% mIoU.
arXiv Detail & Related papers (2022-07-08T17:59:01Z)
Deep ensembles in bioimage segmentation [74.01883650587321]
In this work, we propose an ensemble of convolutional neural networks (CNNs) In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers. The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment.
arXiv Detail & Related papers (2021-12-24T05:54:21Z)
Leveraging Image Complexity in Macro-Level Neural Network Design for Medical Image Segmentation [3.974175960216864]
We show that image complexity can be used as a guideline in choosing what is best for a given dataset. For high-complexity datasets, a shallow network running on the original images may yield better segmentation results than a deep network running on downsampled images.
arXiv Detail & Related papers (2021-12-21T09:49:47Z)
Learning Versatile Neural Architectures by Propagating Network Codes [74.2450894473073]
We propose a novel "neural predictor", which is able to predict an architecture's performance in multiple datasets and tasks. NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets.
arXiv Detail & Related papers (2021-03-24T15:20:38Z)
KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z)
Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture. We show consistent improvements in accuracy and learning convergence over the baseline. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.