Exploring Long- and Short-Range Temporal Information for Learned Video
Compression
- URL: http://arxiv.org/abs/2208.03754v3
- Date: Tue, 2 Jan 2024 12:27:46 GMT
- Title: Exploring Long- and Short-Range Temporal Information for Learned Video
Compression
- Authors: Huairui Wang and Zhenzhong Chen
- Abstract summary: We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
- Score: 54.91301930491466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned video compression methods have gained a variety of interest in the
video coding community since they have matched or even exceeded the
rate-distortion (RD) performance of traditional video codecs. However, many
current learning-based methods are dedicated to utilizing short-range temporal
information, thus limiting their performance. In this paper, we focus on
exploiting the unique characteristics of video content and further exploring
temporal information to enhance compression performance. Specifically, for
long-range temporal information exploitation, we propose temporal prior that
can update continuously within the group of pictures (GOP) during inference. In
that case temporal prior contains valuable temporal information of all decoded
images within the current GOP. As for short-range temporal information, we
propose a progressive guided motion compensation to achieve robust and
effective compensation. In detail, we design a hierarchical structure to
achieve multi-scale compensation. More importantly, we use optical flow
guidance to generate pixel offsets between feature maps at each scale, and the
compensation results at each scale will be used to guide the following scale's
compensation. Sufficient experimental results demonstrate that our method can
obtain better RD performance than state-of-the-art video compression
approaches. The code is publicly available on:
https://github.com/Huairui/LSTVC.
Related papers
- LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding [65.46303012350207]
LongVU is an adaptive compression mechanism that reduces the number of video tokens while preserving visual details of long videos.
We leverage DINOv2 features to remove redundant frames that exhibit high similarity.
We perform spatial token reduction across frames based on their temporal dependencies.
arXiv Detail & Related papers (2024-10-22T21:21:37Z) - Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame.
Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-wise Modeling [37.51397331485574]
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression.
These methods have fixed architectures which do not scale to longer videos or higher resolutions.
We propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction.
arXiv Detail & Related papers (2022-12-30T08:17:02Z) - Microdosing: Knowledge Distillation for GAN based Compression [18.140328230701233]
We show how to leverage knowledge distillation to obtain equally capable image decoders at a fraction of the original number of parameters.
This allows us to reduce the model size by a factor of 20 and to achieve 50% reduction in decoding time.
arXiv Detail & Related papers (2022-01-07T14:27:16Z) - Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network.
PLVC learns to compress video towards good perceptual quality at low bit-rate.
The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.