Borrowing from yourself: Faster future video segmentation with partial
channel update
- URL: http://arxiv.org/abs/2202.05748v1
- Date: Fri, 11 Feb 2022 16:37:53 GMT
- Title: Borrowing from yourself: Faster future video segmentation with partial
channel update
- Authors: Evann Courdier and Fran\c{c}ois Fleuret
- Abstract summary: We propose to tackle the task of fast future video segmentation prediction through the use of convolutional layers with time-dependent channel masking.
This technique only updates a chosen subset of the feature maps at each time-step, bringing simultaneously less computation and latency.
We apply this technique to several fast architectures and experimentally confirm its benefits for the future prediction subtask.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation is a well-addressed topic in the computer vision
literature, but the design of fast and accurate video processing networks
remains challenging. In addition, to run on embedded hardware, computer vision
models often have to make compromises on accuracy to run at the required speed,
so that a latency/accuracy trade-off is usually at the heart of these real-time
systems' design. For the specific case of videos, models have the additional
possibility to make use of computations made for previous frames to mitigate
the accuracy loss while being real-time.
In this work, we propose to tackle the task of fast future video segmentation
prediction through the use of convolutional layers with time-dependent channel
masking. This technique only updates a chosen subset of the feature maps at
each time-step, bringing simultaneously less computation and latency, and
allowing the network to leverage previously computed features. We apply this
technique to several fast architectures and experimentally confirm its benefits
for the future prediction subtask.
Related papers
- SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity [15.872209884833977]
We propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.
SparseTem achieves speedup of 1.79x for EfficientDet and 4.72x for CRNN, with minimal accuracy drop and no additional memory overhead.
arXiv Detail & Related papers (2024-10-28T07:13:25Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics.
Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing.
We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Real-Time Segmentation Networks should be Latency Aware [0.0]
We argue that the commonly used performance metric of mean Intersection over Union (mIoU) does not fully capture the information required to estimate the true performance of these networks when they operate inreal-time'
We propose a change of objective in the segmentation task, and its associated metric that encapsulates this missing information in the following way: We propose to predict the future output segmentation map that will match the future input frame at the time when the network finishes the processing.
arXiv Detail & Related papers (2020-04-06T11:41:31Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.