Shift-Memory Network for Temporal Scene Segmentation
- URL: http://arxiv.org/abs/2202.08399v1
- Date: Thu, 17 Feb 2022 01:42:34 GMT
- Title: Shift-Memory Network for Temporal Scene Segmentation
- Authors: Guo Cheng, Jiang Yu Zheng
- Abstract summary: We extend semantic segmentation in temporal domain to enhance spatial accuracy with motion.
We utilize a shift-mode network over streaming input to ensure zero-latency output.
Experiments achieve equivalent accuracy as shift-mode but in faster inference speeds and much smaller memory.
- Score: 2.3986080077861787
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Semantic segmentation has achieved great accuracy in understanding spatial
layout. For real-time tasks based on dynamic scenes, we extend semantic
segmentation in temporal domain to enhance the spatial accuracy with motion. We
utilize a shift-mode network over streaming input to ensure zero-latency
output. For the data overlap under shifting network, this paper identifies
repeated computation in fixed periods across network layers. To avoid this
redundancy, we derive a Shift-Memory Network (SMN) from encoding-decoding
baseline to reuse the network values without accuracy loss. Trained in
patch-mode, the SMN extracts the network parameters for SMN to perform
inference promptly in compact memory. We segment dynamic scenes from 1D
scanning input and 2D video. The experiments of SMN achieve equivalent accuracy
as shift-mode but in faster inference speeds and much smaller memory. This will
facilitate semantic segmentation in real-time application on edge devices.
Related papers
- Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation [15.83905822380148]
RDRNet is a Dual-Resolution Network dedicated to real-time semantic segmentation.
RDRNet employs a two-branch architecture, utilizing multi-path blocks during training and re parameterizing them into single-path blocks during inference.
Experimental results on the Cityscapes, CamVid, and Pascal VOC 2012 datasets demonstrate that RDRNet outperforms existing state-of-the-art models in terms of both performance and speed.
arXiv Detail & Related papers (2024-06-18T10:59:10Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - A De-raining semantic segmentation network for real-time foreground
segmentation [0.0]
This paper proposes a lightweight network for the segmentation in rainy environments, named Deraining Semantic Accuracy Network (DRSNet)
By analyzing the characteristics of raindrops, the MultiScaleSE Block is targetedly designed to encode the input image.
In order to combine semantic information between different encoder and decoder layers, it is proposed to use Asymmetric Skip.
arXiv Detail & Related papers (2021-04-16T04:09:13Z) - HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation [95.47168925127089]
We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder.
We design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features.
arXiv Detail & Related papers (2020-12-21T18:58:18Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.