ResQ: Residual Quantization for Video Perception
- URL: http://arxiv.org/abs/2308.09511v1
- Date: Fri, 18 Aug 2023 12:41:10 GMT
- Title: ResQ: Residual Quantization for Video Perception
- Authors: Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian
- Abstract summary: We propose a novel quantization scheme for video networks coined as Residual Quantization.
We extend our model to dynamically adjust the bit-width proportional to the amount of changes in the video.
- Score: 18.491197847596283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper accelerates video perception, such as semantic segmentation and
human pose estimation, by levering cross-frame redundancies. Unlike the
existing approaches, which avoid redundant computations by warping the past
features using optical-flow or by performing sparse convolutions on frame
differences, we approach the problem from a new perspective: low-bit
quantization. We observe that residuals, as the difference in network
activations between two neighboring frames, exhibit properties that make them
highly quantizable. Based on this observation, we propose a novel quantization
scheme for video networks coined as Residual Quantization. ResQ extends the
standard, frame-by-frame, quantization scheme by incorporating temporal
dependencies that lead to better performance in terms of accuracy vs.
bit-width. Furthermore, we extend our model to dynamically adjust the bit-width
proportional to the amount of changes in the video. We demonstrate the
superiority of our model, against the standard quantization and existing
efficient video perception models, using various architectures on semantic
segmentation and human pose estimation benchmarks.
Related papers
- QVD: Post-training Quantization for Video Diffusion Models [33.13078954859106]
Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency.
We introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD.
We achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.
arXiv Detail & Related papers (2024-07-16T10:47:27Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - PeQuENet: Perceptual Quality Enhancement of Compressed Video with
Adaptation- and Attention-based Network [27.375830262287163]
We propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos.
Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model.
Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
arXiv Detail & Related papers (2022-06-16T02:49:28Z) - Representation Recycling for Streaming Video Analysis [19.068248496174903]
StreamDEQ aims to infer frame-wise representations on videos with minimal per-frame computation.
We show that StreamDEQ is able to recover near-optimal representations in a few frames' time and maintain an up-to-date representation throughout the video duration.
arXiv Detail & Related papers (2022-04-28T13:35:14Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - Capturing Video Frame Rate Variations via Entropic Differencing [63.749184706461826]
We propose a novel statistical entropic differencing method based on a Generalized Gaussian Distribution model.
Our proposed model correlates very well with subjective scores in the recently proposed LIVE-YT-HFR database.
arXiv Detail & Related papers (2020-06-19T22:16:52Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.