Related papers: BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

URL: http://arxiv.org/abs/2108.09376v1
Date: Fri, 20 Aug 2021 21:16:01 GMT
Title: BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
Authors: Thomas Verelst, Tinne Tuytelaars
Abstract summary: BlockCopy is a scheme that accelerates pretrained frame-based CNNs to process video more efficiently. A lightweight policy network determines important regions in an image, and operations are applied on selected regions only. Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency.
Score: 57.62315799929681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we propose BlockCopy, a scheme that accelerates pretrained frame-based CNNs to process video more efficiently, compared to standard frame-by-frame processing. To this end, a lightweight policy network determines important regions in an image, and operations are applied on selected regions only, using custom block-sparse convolutions. Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency. The execution policy is trained using reinforcement learning in an online fashion without requiring ground truth annotations. Our universal framework is demonstrated on dense prediction tasks such as pedestrian detection, instance segmentation and semantic segmentation, using both state of the art (Center and Scale Predictor, MGAN, SwiftNet) and standard baseline networks (Mask-RCNN, DeepLabV3+). BlockCopy achieves significant FLOPS savings and inference speedup with minimal impact on accuracy.

Related papers

Fast SAM2 with Text-Driven Token Pruning [52.8350457627401]
Segment Anything Model 2 (SAM2), a vision computation model has significantly advanced in prompt-driven video object segmentation.<n>SAM2 pipelines propagate all visual tokens produced by the image encoder through downstream temporal reasoning modules, regardless of their relevance to the target object.<n>We introduce a text-guided token pruning framework that improves inference efficiency by selectively reducing token density prior to temporal propagation.
arXiv Detail & Related papers (2025-12-24T18:59:05Z)
Local Compressed Video Stream Learning for Generic Event Boundary Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks. Existing methods typically require video frames to be decoded before feeding into the network. We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z)
Boosting Video Object Segmentation via Space-time Correspondence Learning [48.8275459383339]
Current solutions for video object segmentation (VOS) typically follow a matching-based regime. We devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching. Our algorithm provides solid performance gains on four widely used benchmarks.
arXiv Detail & Related papers (2023-04-13T01:34:44Z)
Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks. Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins. We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z)
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection [31.31508043234419]
We propose a new end-to-end compressed video representation learning for event boundary detection. We first use the ConvNets to extract features of the I-frames in the GOPs. After that, a light-weight spatial-channel compressed encoder is designed to compute the feature representations of the P-frames. A temporal contrastive module is proposed to determine the event boundaries of video sequences.
arXiv Detail & Related papers (2022-03-29T08:27:48Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Multi-Task Network Pruning and Embedded Optimization for Real-time Deployment in ADAS [0.0]
Camera-based Deep Learning algorithms are increasingly needed for perception in Automated Driving systems. constraints from the automotive industry challenge the deployment of CNNs by imposing embedded systems with limited computational resources. We propose an approach to embed a multi-task CNN network under such conditions on a commercial prototype platform.
arXiv Detail & Related papers (2021-01-19T19:29:38Z)
Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting. We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z)
Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.