Rethinking BiSeNet For Real-time Semantic Segmentation
- URL: http://arxiv.org/abs/2104.13188v1
- Date: Tue, 27 Apr 2021 13:49:47 GMT
- Title: Rethinking BiSeNet For Real-time Semantic Segmentation
- Authors: Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai,
Junfeng Luo, Xiaolin Wei
- Abstract summary: BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
- Score: 6.622485130017622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: BiSeNet has been proved to be a popular two-stream network for real-time
segmentation. However, its principle of adding an extra path to encode spatial
information is time-consuming, and the backbones borrowed from pretrained
tasks, e.g., image classification, may be inefficient for image segmentation
due to the deficiency of task-specific design. To handle these problems, we
propose a novel and efficient structure named Short-Term Dense Concatenate
network (STDC network) by removing structure redundancy. Specifically, we
gradually reduce the dimension of feature maps and use the aggregation of them
for image representation, which forms the basic module of STDC network. In the
decoder, we propose a Detail Aggregation module by integrating the learning of
spatial information into low-level layers in single-stream manner. Finally, the
low-level features and deep features are fused to predict the final
segmentation results. Extensive experiments on Cityscapes and CamVid dataset
demonstrate the effectiveness of our method by achieving promising trade-off
between segmentation accuracy and inference speed. On Cityscapes, we achieve
71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti,
which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0
FPS while inferring on higher resolution images.
Related papers
- Cross-CBAM: A Lightweight network for Scene Segmentation [2.064612766965483]
We present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation.
In experiments on the Cityscapes dataset and Camvid dataset, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.
arXiv Detail & Related papers (2023-06-04T09:03:05Z) - Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic
Segmentation [23.25851281719734]
We propose a Fast Bilateral Symmetrical Network (FBSNet) for real-time semantic segmentation.
FBSNet employs a symmetrical-decoder structure with two branches, semantic information branch, and spatial detail branch.
The experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency.
arXiv Detail & Related papers (2021-09-02T04:16:39Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Real-time Semantic Segmentation via Spatial-detail Guided Context
Propagation [49.70144583431999]
We propose the spatial-detail guided context propagation network (SGCPNet) for achieving real-time semantic segmentation.
It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed.
It achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768x1536 images on a GeForce GTX 1080 Ti GPU card.
arXiv Detail & Related papers (2020-05-22T07:07:26Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.