Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes
- URL: http://arxiv.org/abs/2203.04031v1
- Date: Tue, 8 Mar 2022 11:46:41 GMT
- Title: Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes
- Authors: Xi Weng, Yan Yan, Si Chen, Jing-Hao Xue, Hanzi Wang
- Abstract summary: We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
- Score: 59.81228011432776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, deep convolutional neural network-based methods have
made great progress in semantic segmentation of street scenes. Some recent
methods align feature maps to alleviate the semantic gap between them and
achieve high segmentation accuracy. However, they usually adopt the feature
alignment modules with the same network configuration in the decoder and thus
ignore the different roles of stages of the decoder during feature aggregation,
leading to a complex decoder structure. Such a manner greatly affects the
inference speed. In this paper, we present a novel Stage-aware Feature
Alignment Network (SFANet) based on the encoder-decoder structure for real-time
semantic segmentation of street scenes. Specifically, a Stage-aware Feature
Alignment module (SFA) is proposed to align and aggregate two adjacent levels
of feature maps effectively. In the SFA, by taking into account the unique role
of each stage in the decoder, a novel stage-aware Feature Enhancement Block
(FEB) is designed to enhance spatial details and contextual information of
feature maps from the encoder. In this way, we are able to address the
misalignment problem with a very simple and efficient multi-branch decoder
structure. Moreover, an auxiliary training strategy is developed to explicitly
alleviate the multi-scale object problem without bringing additional
computational costs during the inference phase. Experimental results show that
the proposed SFANet exhibits a good balance between accuracy and speed for
real-time semantic segmentation of street scenes. In particular, based on
ResNet-18, SFANet respectively obtains 78.1% and 74.7% mean of class-wise
Intersection-over-Union (mIoU) at inference speeds of 37 FPS and 96 FPS on the
challenging Cityscapes and CamVid test datasets by using only a single GTX
1080Ti GPU.
Related papers
- SegNetr: Rethinking the local-global interactions and skip connections
in U-shaped networks [1.121518046252855]
U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure.
We introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity.
We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59% and 76% fewer parameters and GFLOPs than vanilla U-Net.
arXiv Detail & Related papers (2023-07-06T12:39:06Z) - SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z) - Feature Reuse and Fusion for Real-time Semantic segmentation [0.0]
How to increase the speed while maintaining high resolution is a problem that has been discussed and solved.
We hope to design a light-weight network based on previous design experience and reach the level of state-of-the-art real-time semantic segmentation.
arXiv Detail & Related papers (2021-05-27T06:47:02Z) - Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z) - Dense Interaction Learning for Video-based Person Re-identification [75.03200492219003]
We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
arXiv Detail & Related papers (2021-03-16T12:22:08Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.