FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution
- URL: http://arxiv.org/abs/2003.03913v1
- Date: Mon, 9 Mar 2020 03:53:57 GMT
- Title: FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution
- Authors: Zhanpeng Zhang and Kaipeng Zhang
- Abstract summary: We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
- Score: 14.226301825772174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time semantic segmentation is desirable in many robotic applications
with limited computation resources. One challenge of semantic segmentation is
to deal with the object scale variations and leverage the context. How to
perform multi-scale context aggregation within limited computation budget is
important. In this paper, firstly, we introduce a novel and efficient module
called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a
lightweight cascaded structure for Convolutional Neural Networks (CNNs) to
efficiently leverage context information. On the other hand, for runtime
efficiency, state-of-the-art methods will quickly decrease the spatial size of
the inputs or feature maps in the early network stages. The final
high-resolution result is usually obtained by non-parametric up-sampling
operation (e.g. bilinear interpolation). Differently, we rethink this pipeline
and treat it as a super-resolution process. We use optimized super-resolution
operation in the up-sampling step and improve the accuracy, especially in
sub-sampled input image scenario for real-time applications. By fusing the
above two improvements, our methods provide better latency-accuracy trade-off
than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU
at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU
card. The proposed module can be plugged into any feature extraction CNN and
benefits from the CNN structure development.
Related papers
- Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic
Programming [15.458305667190256]
We propose a novel depth compression algorithm which targets general convolution operations.
We achieve $1.41times$ speed-up with $0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.
arXiv Detail & Related papers (2023-01-28T13:08:54Z) - DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual
Information for Real-time Semantic Segmentation [10.379708894083217]
We propose a highly efficient multi-scale feature extraction method, which decomposes the original single-step method into two steps, Region Residualization-Semantic Residualization.
We achieve an mIoU of 72.7% on the Cityscapes test set at a speed of 319.5 FPS on one NVIDIA GeForce GTX 1080 Ti card, which exceeds the latest methods of a speed of 69.5 FPS and 0.8% mIoU.
arXiv Detail & Related papers (2022-12-02T13:55:41Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.