Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes
- URL: http://arxiv.org/abs/2203.04037v1
- Date: Tue, 8 Mar 2022 12:07:32 GMT
- Title: Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes
- Authors: Xi Weng, Yan Yan, Genshun Dong, Chang Shu, Biao Wang, Hanzi Wang, Ji
Zhang
- Abstract summary: Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
- Score: 32.54045305607654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time semantic segmentation, which aims to achieve high segmentation
accuracy at real-time inference speed, has received substantial attention over
the past few years. However, many state-of-the-art real-time semantic
segmentation methods tend to sacrifice some spatial details or contextual
information for fast inference, thus leading to degradation in segmentation
quality. In this paper, we propose a novel Deep Multi-branch Aggregation
Network (called DMA-Net) based on the encoder-decoder structure to perform
real-time semantic segmentation in street scenes. Specifically, we first adopt
ResNet-18 as the encoder to efficiently generate various levels of feature maps
from different stages of convolutions. Then, we develop a Multi-branch
Aggregation Network (MAN) as the decoder to effectively aggregate different
levels of feature maps and capture the multi-scale information. In MAN, a
lattice enhanced residual block is designed to enhance feature representations
of the network by taking advantage of the lattice structure. Meanwhile, a
feature transformation block is introduced to explicitly transform the feature
map from the neighboring branch before feature aggregation. Moreover, a global
context block is used to exploit the global contextual information. These key
components are tightly combined and jointly optimized in a unified network.
Extensive experimental results on the challenging Cityscapes and CamVid
datasets demonstrate that our proposed DMA-Net respectively obtains 77.0% and
73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS
and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU. This shows that
DMA-Net provides a good tradeoff between segmentation quality and speed for
semantic segmentation in street scenes.
Related papers
- SegNetr: Rethinking the local-global interactions and skip connections
in U-shaped networks [1.121518046252855]
U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure.
We introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity.
We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59% and 76% fewer parameters and GFLOPs than vanilla U-Net.
arXiv Detail & Related papers (2023-07-06T12:39:06Z) - Cross-CBAM: A Lightweight network for Scene Segmentation [2.064612766965483]
We present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation.
In experiments on the Cityscapes dataset and Camvid dataset, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.
arXiv Detail & Related papers (2023-06-04T09:03:05Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z) - HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation [95.47168925127089]
We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder.
We design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features.
arXiv Detail & Related papers (2020-12-21T18:58:18Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.