Related papers: Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

URL: http://arxiv.org/abs/2006.13560v4
Date: Sun, 6 Dec 2020 10:07:34 GMT
Title: Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model
Authors: Ren Yang, Fabian Mentzer, Luc Van Gool and Radu Timofte
Abstract summary: This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM) The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames. Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
Score: 164.7489982837475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git.

Related papers

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding [55.320254859515714]
ReTaKe enables VideoLLMs to process 8 times longer frames (up to 2048), similar-sized models by 3-5% and even rivaling much larger ones on VideoMME, MLVU, LongVideoBench, and LVBench. Our code is available at https://github.com/SCZwangxiao/video-ReTaKe.
arXiv Detail & Related papers (2024-12-29T15:42:24Z)
High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency. Uses hierarchical predictive coding to transform each video frame into multiscale representations. Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z)
NVRC: Neural Video Representation Compression [13.131842990481038]
We propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC) NVRC, for the first time, is able to optimize an INR-based video in a fully end-to-end manner. Our experiments show that NVRC outperforms many conventional and learning-based benchmark entropy.
arXiv Detail & Related papers (2024-09-11T16:57:12Z)
Extreme Video Compression with Pre-trained Diffusion Models [11.898317376595697]
We present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The entire video is sequentially encoded to achieve a visually pleasing reconstruction, considering perceptual quality metrics. Results showcase the potential of exploiting the temporal relations in video data using generative models.
arXiv Detail & Related papers (2024-02-14T04:23:05Z)
IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation. We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z)
Advancing Learned Video Compression with In-loop Frame Prediction [177.67218448278143]
In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module. The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance. The experiments show the state-of-the-art performance of our ALVC approach in learned video compression.
arXiv Detail & Related papers (2022-11-13T19:53:14Z)
A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution [15.690562510147766]
Video Super-Resolution (VSR) using recurrent neural network architecture is a promising solution due to its efficient modeling of long-range temporal dependencies. We propose a Codec Information Assisted Framework (CIAF) to boost and accelerate recurrent VSR models for compressed videos.
arXiv Detail & Related papers (2022-10-15T08:48:29Z)
Temporal Context Mining for Learned Video Compression [25.348411353589878]
We address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts. For temporal context mining, we propose to store not only the previously reconstructed frames, but also the propagated features into the generalized decoded picture buffer. Our scheme discards the parallelization-unfriendly auto-regressive entropy model to pursue a more practical decoding time.
arXiv Detail & Related papers (2021-11-27T08:55:16Z)
Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network. PLVC learns to compress video towards good perceptual quality at low bit-rate. The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z)
Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z)
M-LVC: Multiple Frames Prediction for Learned Video Compression [111.50760486258993]
We propose an end-to-end learned video compression scheme for low-latency scenarios. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode.
arXiv Detail & Related papers (2020-04-21T20:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.