Effortless Cross-Platform Video Codec: A Codebook-Based Method
- URL: http://arxiv.org/abs/2310.10292v1
- Date: Mon, 16 Oct 2023 11:28:22 GMT
- Title: Effortless Cross-Platform Video Codec: A Codebook-Based Method
- Authors: Kuan Tian and Yonghang Guan and Jinxi Xiang and Jun Zhang and Xiao Han
and Wei Yang
- Abstract summary: In certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance.
One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model.
In this paper, we propose a cross-platform video compression framework based on codebooks.
- Score: 17.141950680993617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Under certain circumstances, advanced neural video codecs can surpass the
most complex traditional codecs in their rate-distortion (RD) performance. One
of the main reasons for the high performance of existing neural video codecs is
the use of the entropy model, which can provide more accurate probability
distribution estimations for compressing the latents. This also implies the
rigorous requirement that entropy models running on different platforms should
use consistent distribution estimations. However, in cross-platform scenarios,
entropy models running on different platforms usually yield inconsistent
probability distribution estimations due to floating point computation errors
that are platform-dependent, which can cause the decoding side to fail in
correctly decoding the compressed bitstream sent by the encoding side. In this
paper, we propose a cross-platform video compression framework based on
codebooks, which avoids autoregressive entropy modeling and achieves video
compression by transmitting the index sequence of the codebooks. Moreover,
instead of using optical flow for context alignment, we propose to use the
conditional cross-attention module to obtain the context between frames. Due to
the absence of autoregressive modeling and optical flow alignment, we can
design an extremely minimalist framework that can greatly benefit computational
efficiency. Importantly, our framework no longer contains any distribution
estimation modules for entropy modeling, and thus computations across platforms
are not necessarily consistent. Experimental results show that our method can
outperform the traditional H.265 (medium) even without any entropy constraints,
while achieving the cross-platform property intrinsically.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Towards Real-Time Neural Video Codec for Cross-Platform Application
Using Calibration Information [17.141950680993617]
Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream.
The high computational complexity of the encoding and decoding process poses a challenge in achieving real-time performance.
A real-time cross-platform neural video is capable of efficiently decoding of 720P video bitstream from other encoding platforms on a consumer-grade GPU.
arXiv Detail & Related papers (2023-09-20T13:01:15Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression [25.96187914295921]
This paper proposes a powerful entropy model which efficiently captures both spatial and temporal dependencies.
Our entropy model can achieve 18.2% saving on UVG dataset when compared with H266 (VTM) using the highest compression ratio.
arXiv Detail & Related papers (2022-07-13T00:03:54Z) - Entroformer: A Transformer-based Entropy Model for Learned Image
Compression [17.51693464943102]
We propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation.
The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
arXiv Detail & Related papers (2022-02-11T08:03:31Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning for Video Compression with Recurrent Auto-Encoder and Recurrent
Probability Model [164.7489982837475]
This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model ( RPM)
The RAE employs recurrent cells in both the encoder and decoder to exploit the temporal correlation among video frames.
Our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
arXiv Detail & Related papers (2020-06-24T08:46:33Z) - Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models.
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.