Real-Time Segmentation Networks should be Latency Aware
- URL: http://arxiv.org/abs/2004.02574v2
- Date: Wed, 20 Apr 2022 12:20:45 GMT
- Title: Real-Time Segmentation Networks should be Latency Aware
- Authors: Evann Courdier and Francois Fleuret
- Abstract summary: We argue that the commonly used performance metric of mean Intersection over Union (mIoU) does not fully capture the information required to estimate the true performance of these networks when they operate inreal-time'
We propose a change of objective in the segmentation task, and its associated metric that encapsulates this missing information in the following way: We propose to predict the future output segmentation map that will match the future input frame at the time when the network finishes the processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As scene segmentation systems reach visually accurate results, many recent
papers focus on making these network architectures faster, smaller and more
efficient. In particular, studies often aim at designingreal-time'systems.
Achieving this goal is particularly relevant in the context of real-time video
understanding for autonomous vehicles, and robots. In this paper, we argue that
the commonly used performance metric of mean Intersection over Union (mIoU)
does not fully capture the information required to estimate the true
performance of these networks when they operate inreal-time'. We propose a
change of objective in the segmentation task, and its associated metric that
encapsulates this missing information in the following way: We propose to
predict the future output segmentation map that will match the future input
frame at the time when the network finishes the processing. We introduce the
associated latency-aware metric, from which we can determine a ranking. We
perform latency timing experiments of some recent networks on different
hardware and assess the performances of these networks on our proposed task. We
propose improvements to scene segmentation networks to better perform on our
task by using multi-frames input and increasing capacity in the initial
convolutional layers.
Related papers
- An Empirical Study of Attention Networks for Semantic Segmentation [11.000308726481236]
Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets.
This paper first conducts experiments to analyze their computation complexity and compare their performance.
arXiv Detail & Related papers (2023-09-19T00:07:57Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Borrowing from yourself: Faster future video segmentation with partial
channel update [0.0]
We propose to tackle the task of fast future video segmentation prediction through the use of convolutional layers with time-dependent channel masking.
This technique only updates a chosen subset of the feature maps at each time-step, bringing simultaneously less computation and latency.
We apply this technique to several fast architectures and experimentally confirm its benefits for the future prediction subtask.
arXiv Detail & Related papers (2022-02-11T16:37:53Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Spatio-temporal Modeling for Large-scale Vehicular Networks Using Graph
Convolutional Networks [110.80088437391379]
A graph-based framework called SMART is proposed to model and keep track of the statistics of vehicle-to-temporal (V2I) communication latency across a large geographical area.
We develop a graph reconstruction-based approach using a graph convolutional network integrated with a deep Q-networks algorithm.
Our results show that the proposed method can significantly improve both the accuracy and efficiency for modeling and the latency performance of large vehicular networks.
arXiv Detail & Related papers (2021-03-13T06:56:29Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.