Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes
- URL: http://arxiv.org/abs/2003.08736v2
- Date: Fri, 3 Apr 2020 12:27:53 GMT
- Title: Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes
- Authors: Genshun Dong, Yan Yan, Chunhua Shen and Hanzi Wang
- Abstract summary: We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
- Score: 98.65457534223539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding
performance in semantic image segmentation. However, state-of-the-art
DCNN-based semantic segmentation methods usually suffer from high computational
complexity due to the use of complex network architectures. This greatly limits
their applications in the real-world scenarios that require real-time
processing. In this paper, we propose a real-time high-performance DCNN-based
method for robust semantic segmentation of urban street scenes, which achieves
a good trade-off between accuracy and speed. Specifically, a Lightweight
Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used
as our baseline network to efficiently obtain dense feature maps. Then, the
Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the
different sizes of pooling operations to encode the rich and distinctive
semantic information, is developed to detect objects at multiple scales.
Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional
layers is designed to generate high-resolution feature maps preserving the
detailed spatial information. Finally, a simple but practical Feature Fusion
Network (FFN) is used to effectively combine both shallow and deep features
from the semantic branch (DASPP) and the spatial branch (SPN), respectively.
Extensive experimental results show that the proposed method respectively
achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU)
with the inference speed of 51.0 fps and 39.3 fps on the challenging Cityscapes
and CamVid test datasets (by only using a single NVIDIA TITAN X card). This
demonstrates that the proposed method offers excellent performance at the
real-time speed for semantic segmentation of urban street scenes.
Related papers
- SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z) - Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z) - Real-time Semantic Segmentation with Context Aggregation Network [14.560708848716754]
We propose a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art.
We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset.
arXiv Detail & Related papers (2020-11-02T14:16:23Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Real-time Semantic Segmentation via Spatial-detail Guided Context
Propagation [49.70144583431999]
We propose the spatial-detail guided context propagation network (SGCPNet) for achieving real-time semantic segmentation.
It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed.
It achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768x1536 images on a GeForce GTX 1080 Ti GPU card.
arXiv Detail & Related papers (2020-05-22T07:07:26Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.