BlockCopy: High-Resolution Video Processing with Block-Sparse Feature
Propagation and Online Policies
- URL: http://arxiv.org/abs/2108.09376v1
- Date: Fri, 20 Aug 2021 21:16:01 GMT
- Title: BlockCopy: High-Resolution Video Processing with Block-Sparse Feature
Propagation and Online Policies
- Authors: Thomas Verelst, Tinne Tuytelaars
- Abstract summary: BlockCopy is a scheme that accelerates pretrained frame-based CNNs to process video more efficiently.
A lightweight policy network determines important regions in an image, and operations are applied on selected regions only.
Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency.
- Score: 57.62315799929681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose BlockCopy, a scheme that accelerates pretrained
frame-based CNNs to process video more efficiently, compared to standard
frame-by-frame processing. To this end, a lightweight policy network determines
important regions in an image, and operations are applied on selected regions
only, using custom block-sparse convolutions. Features of non-selected regions
are simply copied from the preceding frame, reducing the number of computations
and latency. The execution policy is trained using reinforcement learning in an
online fashion without requiring ground truth annotations. Our universal
framework is demonstrated on dense prediction tasks such as pedestrian
detection, instance segmentation and semantic segmentation, using both state of
the art (Center and Scale Predictor, MGAN, SwiftNet) and standard baseline
networks (Mask-RCNN, DeepLabV3+). BlockCopy achieves significant FLOPS savings
and inference speedup with minimal impact on accuracy.
Related papers
- Local Compressed Video Stream Learning for Generic Event Boundary
Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
Existing methods typically require video frames to be decoded before feeding into the network.
We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z) - Boosting Video Object Segmentation via Space-time Correspondence
Learning [48.8275459383339]
Current solutions for video object segmentation (VOS) typically follow a matching-based regime.
We devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching.
Our algorithm provides solid performance gains on four widely used benchmarks.
arXiv Detail & Related papers (2023-04-13T01:34:44Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - End-to-End Compressed Video Representation Learning for Generic Event
Boundary Detection [31.31508043234419]
We propose a new end-to-end compressed video representation learning for event boundary detection.
We first use the ConvNets to extract features of the I-frames in the GOPs.
After that, a light-weight spatial-channel compressed encoder is designed to compute the feature representations of the P-frames.
A temporal contrastive module is proposed to determine the event boundaries of video sequences.
arXiv Detail & Related papers (2022-03-29T08:27:48Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Multi-Task Network Pruning and Embedded Optimization for Real-time
Deployment in ADAS [0.0]
Camera-based Deep Learning algorithms are increasingly needed for perception in Automated Driving systems.
constraints from the automotive industry challenge the deployment of CNNs by imposing embedded systems with limited computational resources.
We propose an approach to embed a multi-task CNN network under such conditions on a commercial prototype platform.
arXiv Detail & Related papers (2021-01-19T19:29:38Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.