Related papers: Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

URL: http://arxiv.org/abs/2003.13328v1
Date: Mon, 30 Mar 2020 10:40:11 GMT
Title: Strip Pooling: Rethinking Spatial Pooling for Scene Parsing
Authors: Qibin Hou, Li Zhang, Ming-Ming Cheng, Jiashi Feng
Abstract summary: We introduce strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. We compare the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play module in existing scene parsing networks.
Score: 161.7521770950933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies, 2) presenting a novel building block with diverse spatial pooling as a core, and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play module in existing scene parsing networks. Extensive experiments on popular benchmarks (e.g., ADE20K and Cityscapes) demonstrate that our simple approach establishes new state-of-the-art results. Code is made available at https://github.com/Andrew-Qibin/SPNet.

Related papers

MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs [9.656707333320037]
Pooling is essentially an operation from the field of Mathematical Morphology, with max pooling as a limited special case. In addition to pooling operations, encoder-decoder networks used for pixel-level predictions also require unpooling. Extensive experimentation on two tasks and three large-scale datasets shows that morphological pooling and unpooling lead to improved predictive performance at much reduced parameter counts.
arXiv Detail & Related papers (2022-11-25T11:25:20Z)
Pooling Revisited: Your Receptive Field is Suboptimal [35.11562214480459]
The size and shape of the receptive field determine how the network aggregates local information. We propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool. Our experiments show that the models equipped with the proposed learnable resizing module outperform the baseline networks on multiple datasets in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-30T17:03:40Z)
Fuzzy Pooling [7.6146285961466]
Convolutional Neural Networks (CNNs) are artificial learning systems typically based on two operations: convolution and pooling. We present a novel pooling operation based on (type-1) fuzzy sets to cope with the local imprecision of the feature maps. Experiments using publicly available datasets show that the proposed approach can enhance the classification performance of a CNN.
arXiv Detail & Related papers (2022-02-12T11:18:32Z)
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling [82.08631594071656]
Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs) We propose an adaptive and exponentially weighted pooling method named adaPool. We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection.
arXiv Detail & Related papers (2021-11-01T08:50:37Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
Refining activation downsampling with SoftPool [74.1840492087968]
Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. We propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling. We show that SoftPool can retain more information in the reduced activation maps.
arXiv Detail & Related papers (2021-01-02T12:09:49Z)
ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation [49.15948235059343]
We further improve point-temporal cloud feature with a flexible module called ASAP. Our ASAP module contains an attentive temporal embedding layer to fuse the relatively informative local features across frames in a recurrent fashion. We show the generalization ability of the proposed ASAP module with different computation backbone networks for point cloud sequence segmentation.
arXiv Detail & Related papers (2020-08-12T07:37:16Z)
SimPool: Towards Topology Based Graph Pooling with Structural Similarity Features [0.0]
This paper proposes two main contributions, the first is a differential module calculating structural similarity features based on the adjacency matrix. The second main contribution is on integrating these features with a revisited pooling layer DiffPool arXiv:1806.08804 to propose a pooling layer referred to as SimPool. Experimental results demonstrate that as part of an end-to-end Graph Neural Network architecture SimPool calculates node cluster assignments that resemble more to the locality.
arXiv Detail & Related papers (2020-06-03T12:51:57Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.