Temporal Context Mining for Learned Video Compression
- URL: http://arxiv.org/abs/2111.13850v1
- Date: Sat, 27 Nov 2021 08:55:16 GMT
- Title: Temporal Context Mining for Learned Video Compression
- Authors: Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, Yan Lu
- Abstract summary: We address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts.
For temporal context mining, we propose to store not only the previously reconstructed frames, but also the propagated features into the generalized decoded picture buffer.
Our scheme discards the parallelization-unfriendly auto-regressive entropy model to pursue a more practical decoding time.
- Score: 25.348411353589878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address end-to-end learned video compression with a special focus on
better learning and utilizing temporal contexts. For temporal context mining,
we propose to store not only the previously reconstructed frames, but also the
propagated features into the generalized decoded picture buffer. From the
stored propagated features, we propose to learn multi-scale temporal contexts,
and re-fill the learned temporal contexts into the modules of our compression
scheme, including the contextual encoder-decoder, the frame generator, and the
temporal context encoder. Our scheme discards the parallelization-unfriendly
auto-regressive entropy model to pursue a more practical decoding time. We
compare our scheme with x264 and x265 (representing industrial software for
H.264 and H.265, respectively) as well as the official reference software for
H.264, H.265, and H.266 (JM, HM, and VTM, respectively). When intra period is
32 and oriented to PSNR, our scheme outperforms H.265--HM by 14.4% bit rate
saving; when oriented to MS-SSIM, our scheme outperforms H.266--VTM by 21.1%
bit rate saving.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Accelerating Learned Video Compression via Low-Resolution Representation Learning [18.399027308582596]
We introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning.
Our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM.
arXiv Detail & Related papers (2024-07-23T12:02:57Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Exploring Long- and Short-Range Temporal Information for Learned Video
Compression [54.91301930491466]
We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
arXiv Detail & Related papers (2022-08-07T15:57:18Z) - Learning to Compress Videos without Computing Motion [39.46212197928986]
We propose a new deep learning video compression architecture that does not require motion estimation.
Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations.
Our experiments show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion.
arXiv Detail & Related papers (2020-09-29T15:49:25Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning for Video Compression with Recurrent Auto-Encoder and Recurrent
Probability Model [164.7489982837475]
This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM)
The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames.
Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
arXiv Detail & Related papers (2020-06-24T08:46:33Z) - M-LVC: Multiple Frames Prediction for Learned Video Compression [111.50760486258993]
We propose an end-to-end learned video compression scheme for low-latency scenarios.
In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one.
Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode.
arXiv Detail & Related papers (2020-04-21T20:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.