Advancing Learned Video Compression with In-loop Frame Prediction
- URL: http://arxiv.org/abs/2211.07004v2
- Date: Tue, 15 Nov 2022 23:09:16 GMT
- Title: Advancing Learned Video Compression with In-loop Frame Prediction
- Authors: Ren Yang, Radu Timofte, Luc Van Gool
- Abstract summary: In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module.
The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance.
The experiments show the state-of-the-art performance of our ALVC approach in learned video compression.
- Score: 177.67218448278143
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent years have witnessed an increasing interest in end-to-end learned
video compression. Most previous works explore temporal redundancy by detecting
and compressing a motion map to warp the reference frame towards the target
frame. Yet, it failed to adequately take advantage of the historical priors in
the sequential reference frames. In this paper, we propose an Advanced Learned
Video Compression (ALVC) approach with the in-loop frame prediction module,
which is able to effectively predict the target frame from the previously
compressed frames, \textit{without consuming any bit-rate}. The predicted frame
can serve as a better reference than the previously compressed frame, and
therefore it benefits the compression performance. The proposed in-loop
prediction module is a part of the end-to-end video compression and is jointly
optimized in the whole framework. We propose the recurrent and the
bi-directional in-loop prediction modules for compressing P-frames and
B-frames, respectively. The experiments show the state-of-the-art performance
of our ALVC approach in learned video compression. We also outperform the
default hierarchical B mode of x265 in terms of PSNR and beat the slowest mode
of the SSIM-tuned x265 on MS-SSIM. The project page:
https://github.com/RenYang-home/ALVC.
Related papers
- UCVC: A Unified Contextual Video Compression Framework with Joint
P-frame and B-frame Coding [29.44234507064189]
This paper presents a learned video compression method in response to video compression track of the 6th Challenge on Learned Image Compression (CLIC)
We propose a unified contextual video compression framework (UCVC) for joint P-frame and B-frame coding.
arXiv Detail & Related papers (2024-02-02T10:25:39Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Shortcut-V2V: Compression Framework for Video-to-Video Translation based
on Temporal Redundancy Reduction [32.87579824212654]
Shortcut-V2V is a general-purpose compression framework for video-to-video translation.
We show that Shourcut-V2V achieves comparable performance compared to the original video-to-video translation model.
arXiv Detail & Related papers (2023-08-15T19:50:38Z) - Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame.
Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Compressing Video Calls using Synthetic Talking Heads [43.71577046989023]
We propose an end-to-end system for talking head video compression.
Our algorithm transmits pivot frames intermittently while the rest of the talking head video is generated by animating them.
We use a state-of-the-art face reenactment network to detect key points in the non-pivot frames and transmit them to the receiver.
arXiv Detail & Related papers (2022-10-07T16:52:40Z) - Deep Contextual Video Compression [20.301569390401102]
We propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding.
Our method can significantly outperform the previous state-of-theart (SOTA) deep video compression methods.
arXiv Detail & Related papers (2021-09-30T12:14:24Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning for Video Compression with Recurrent Auto-Encoder and Recurrent
Probability Model [164.7489982837475]
This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM)
The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames.
Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
arXiv Detail & Related papers (2020-06-24T08:46:33Z) - M-LVC: Multiple Frames Prediction for Learned Video Compression [111.50760486258993]
We propose an end-to-end learned video compression scheme for low-latency scenarios.
In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one.
Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode.
arXiv Detail & Related papers (2020-04-21T20:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.