SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow
- URL: http://arxiv.org/abs/2207.04415v2
- Date: Fri, 4 Aug 2023 09:00:27 GMT
- Title: SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow
- Authors: Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan
Yang, Yunhai Tong, Dacheng Tao
- Abstract summary: A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
- Score: 88.97790684009979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on exploring effective methods for faster and
accurate semantic segmentation. A common practice to improve the performance is
to attain high-resolution feature maps with strong semantic representation. Two
strategies are widely used: atrous convolutions and feature pyramid fusion,
while both are either computationally intensive or ineffective. Inspired by the
Optical Flow for motion alignment between adjacent video frames, we propose a
Flow Alignment Module (FAM) to learn \textit{Semantic Flow} between feature
maps of adjacent levels and broadcast high-level features to high-resolution
features effectively and efficiently. Furthermore, integrating our FAM to a
standard feature pyramid structure exhibits superior performance over other
real-time methods, even on lightweight backbone networks, such as ResNet-18 and
DFNet. Then to further speed up the inference procedure, we also present a
novel Gated Dual Flow Alignment Module to directly align high-resolution
feature maps and low-resolution feature maps where we term the improved version
network as SFNet-Lite. Extensive experiments are conducted on several
challenging datasets, where results show the effectiveness of both SFNet and
SFNet-Lite. In particular, when using Cityscapes test set, the SFNet-Lite
series achieve 80.1 mIoU while running at 60 FPS using ResNet-18 backbone and
78.8 mIoU while running at 120 FPS using STDC backbone on RTX-3090. Moreover,
we unify four challenging driving datasets into one large dataset, which we
named Unified Driving Segmentation (UDS) dataset. It contains diverse domain
and style information. We benchmark several representative works on UDS. Both
SFNet and SFNet-Lite still achieve the best speed and accuracy trade-off on
UDS, which serves as a strong baseline in such a challenging setting. The code
and models are publicly available at https://github.com/lxtGH/SFSegNets.
Related papers
- Lightweight and Progressively-Scalable Networks for Semantic
Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation.
In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales.
We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z) - A Multi-Stage Duplex Fusion ConvNet for Aerial Scene Classification [4.061135251278187]
We develop ConvNet named multi-stage duplex fusion network (MSDF-Net)
MSDF-Net consists of multi-stage structures with DFblock.
Experiments are conducted on three widely-used aerial scene classification benchmarks.
arXiv Detail & Related papers (2022-03-29T09:27:53Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - Dense Dual-Path Network for Real-time Semantic Segmentation [7.8381744043673045]
We introduce a novel Dual-Path Network (DDPNet) for real-time semantic segmentation under resource constraints.
DDPNet achieves 75.3% mIoU with 52.6 FPS for an input of 1024 X 2048 resolution on a single GTX 1080Ti card.
arXiv Detail & Related papers (2020-10-21T06:11:41Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z) - Semantic Flow for Fast and Accurate Scene Parsing [28.444273169423074]
Flow Alignment Module (FAM) learns Semantic Flow between feature maps of adjacent levels.
Experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid.
Our network is the first to achieve 80.4% mIoU on Cityscapes with a frame rate of 26 FPS.
arXiv Detail & Related papers (2020-02-24T08:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.