Scale-Equalizing Pyramid Convolution for Object Detection
- URL: http://arxiv.org/abs/2005.03101v1
- Date: Wed, 6 May 2020 19:34:56 GMT
- Title: Scale-Equalizing Pyramid Convolution for Object Detection
- Authors: Xinjiang Wang, Shilong Zhang, Zhuoran Yu, Litong Feng, Wayne Zhang
- Abstract summary: Feature pyramid has been an efficient method to extract features at different scales.
Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution.
Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules.
- Score: 22.516829622445062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature pyramid has been an efficient method to extract features at different
scales. Development over this method mainly focuses on aggregating contextual
information at different levels while seldom touching the inter-level
correlation in the feature pyramid. Early computer vision methods extracted
scale-invariant features by locating the feature extrema in both spatial and
scale dimension. Inspired by this, a convolution across the pyramid level is
proposed in this study, which is termed pyramid convolution and is a modified
3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and
spatial) features and outperforms other meticulously designed feature fusion
modules. Based on the viewpoint of 3-D convolution, an integrated batch
normalization that collects statistics from the whole feature pyramid is
naturally inserted after the pyramid convolution. Furthermore, we also show
that the naive pyramid convolution, together with the design of RetinaNet head,
actually best applies for extracting features from a Gaussian pyramid, whose
properties can hardly be satisfied by a feature pyramid. In order to alleviate
this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that
aligns the shared pyramid convolution kernel only at high-level feature maps.
Being computationally efficient and compatible with the head design of most
single-stage object detectors, the SEPC module brings significant performance
improvement ($>4$AP increase on MS-COCO2017 dataset) in state-of-the-art
one-stage object detectors, and a light version of SEPC also has $\sim3.5$AP
gain with only around 7% inference time increase. The pyramid convolution also
functions well as a stand-alone module in two-stage object detectors and is
able to improve the performance by $\sim2$AP. The source code can be found at
https://github.com/jshilong/SEPC.
Related papers
- MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D
Sparse Convolutions [1.124958340749622]
MinkUNeXt is an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block.
A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets.
arXiv Detail & Related papers (2024-03-12T12:25:54Z) - G3Reg: Pyramid Graph-based Global Registration using Gaussian Ellipsoid Model [21.189016878269104]
This study introduces a novel framework, G3Reg, for fast and robust global registration of LiDAR point clouds.
In contrast to conventional complex keypoints and descriptors, we extract fundamental geometric primitives.
We present a distrust-and-verify scheme based on a Pyramid Graph for Global Registration.
arXiv Detail & Related papers (2023-08-22T17:23:00Z) - Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection [89.66162518035144]
We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
We propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance.
arXiv Detail & Related papers (2021-09-06T14:17:51Z) - Learning Feature Aggregation for Deep 3D Morphable Models [57.1266963015401]
We propose an attention based module to learn mapping matrices for better feature aggregation across hierarchical levels.
Our experiments show that through the end-to-end training of the mapping matrices, we achieve state-of-the-art results on a variety of 3D shape datasets.
arXiv Detail & Related papers (2021-05-05T16:41:00Z) - PNEN: Pyramid Non-Local Enhanced Networks [23.17149002568982]
We propose a novel non-local module, Pyramid Non-local Block, to build up connection between every pixel and all remain pixels.
Based on the proposed module, we devise a Pyramid Non-local Enhanced Networks for edge-preserving image smoothing.
We integrate it into two existing methods for image denoising and single image super-resolution, achieving consistently improved performance.
arXiv Detail & Related papers (2020-08-22T03:10:48Z) - Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z) - Feature Pyramid Grids [140.11116687047058]
We present Feature Pyramid Grids (FPG), a deep multi-pathway feature pyramid.
FPG can improve single-pathway feature pyramid networks by significantly increasing its performance at similar computation cost.
arXiv Detail & Related papers (2020-04-07T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.