Related papers: Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

URL: http://arxiv.org/abs/2101.06085v1
Date: Fri, 15 Jan 2021 12:56:18 GMT
Title: Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes
Authors: Yuanduo Hong, Huihui Pan, Weichao Sun, Senior Member, IEEE, Yisong Jia
Abstract summary: We propose novel deep dual-resolution networks ( DDRNets) for real-time semantic segmentation of road scenes. Our method achieves new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset.
Score: 0.23090185577016442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation is a critical technology for autonomous vehicles to understand surrounding scenes. For practical autonomous vehicles, it is undesirable to spend a considerable amount of inference time to achieve high-accuracy segmentation results. Using light-weight architectures (encoder-decoder or two-pathway) or reasoning on low-resolution images, recent methods realize very fast scene parsing which even run at more than 100 FPS on single 1080Ti GPU. However, there are still evident gaps in performance between these real-time methods and models based on dilation backbones. To tackle this problem, we propose novel deep dual-resolution networks (DDRNets) for real-time semantic segmentation of road scenes. Besides, we design a new contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) to enlarge effective receptive fields and fuse multi-scale context. Our method achieves new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. Specially, on single 2080Ti GPU, DDRNet-23-slim yields 77.4% mIoU at 109 FPS on Cityscapes test set and 74.4% mIoU at 230 FPS on CamVid test set. Without utilizing attention mechanism, pre-training on larger semantic segmentation dataset or inference acceleration, DDRNet-39 attains 80.4% test mIoU at 23 FPS on Cityscapes. With widely used test augmentation, our method is still superior to most state-of-the-art models, requiring much less computation. Codes and trained models will be made publicly available.

Related papers

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution [50.55876151973996]
A versatile video depth estimation model should (1) be accurate across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS.
arXiv Detail & Related papers (2025-04-09T17:59:31Z)
Cross-CBAM: A Lightweight network for Scene Segmentation [2.064612766965483]
We present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. In experiments on the Cityscapes dataset and Camvid dataset, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.
arXiv Detail & Related papers (2023-06-04T09:03:05Z)
SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels. We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z)
Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks. Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins. We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z)
Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes. By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder. Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z)
Spatio-Temporal Scene-Graph Embedding for Autonomous Vehicle Collision Prediction [0.3738410998183615]
We show that sg2vec predicts collisions 8.11% more accurately than the state-of-the-art method on synthesized datasets. We also show that sg2vec is better than the state-of-the-art at transferring knowledge from synthetic datasets to real-world driving datasets.
arXiv Detail & Related papers (2021-11-11T10:01:01Z)
Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation. We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z)
Real-time Semantic Segmentation with Context Aggregation Network [14.560708848716754]
We propose a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art. We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset.
arXiv Detail & Related papers (2020-11-02T14:16:23Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
Efficient Video Semantic Segmentation with Labels Propagation and Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.