Real-time Semantic Segmentation with Fast Attention
- URL: http://arxiv.org/abs/2007.03815v2
- Date: Thu, 9 Jul 2020 22:44:34 GMT
- Title: Real-time Semantic Segmentation with Fast Attention
- Authors: Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin,
Kate Saenko, Stan Sclaroff
- Abstract summary: We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
- Score: 94.88466483540692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep CNN based models for semantic segmentation, high accuracy relies on
rich spatial context (large receptive fields) and fine spatial details (high
resolution), both of which incur high computational costs. In this paper, we
propose a novel architecture that addresses both challenges and achieves
state-of-the-art performance for semantic segmentation of high-resolution
images and videos in real-time. The proposed architecture relies on our fast
spatial attention, which is a simple yet efficient modification of the popular
self-attention mechanism and captures the same rich spatial context at a small
fraction of the computational cost, by changing the order of operations.
Moreover, to efficiently process high-resolution input, we apply an additional
spatial reduction to intermediate feature stages of the network with minimal
loss in accuracy thanks to the use of the fast attention module to fuse
features. We validate our method with a series of experiments, and show that
results on multiple datasets demonstrate superior performance with better
accuracy and speed compared to existing approaches for real-time semantic
segmentation. On Cityscapes, our network achieves 74.4$\%$ mIoU at 72 FPS and
75.5$\%$ mIoU at 58 FPS on a single Titan X GPU, which is~$\sim$50$\%$ faster
than the state-of-the-art while retaining the same accuracy.
Related papers
- RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Revisiting Multi-Scale Feature Fusion for Semantic Segmentation [90.32746095413447]
In this paper, we demonstrate that neither high internal resolution nor atrous convolutions are necessary for accurate semantic segmentation.
We develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions.
Our simple method can achieve better accuracy with faster speed than prior art across multiple datasets.
arXiv Detail & Related papers (2022-03-23T19:14:11Z) - Real-time Semantic Segmentation with Context Aggregation Network [14.560708848716754]
We propose a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art.
We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset.
arXiv Detail & Related papers (2020-11-02T14:16:23Z) - Real-time Semantic Segmentation via Spatial-detail Guided Context
Propagation [49.70144583431999]
We propose the spatial-detail guided context propagation network (SGCPNet) for achieving real-time semantic segmentation.
It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed.
It achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768x1536 images on a GeForce GTX 1080 Ti GPU card.
arXiv Detail & Related papers (2020-05-22T07:07:26Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.