Spatiotemporal Entropy Model is All You Need for Learned Video
Compression
- URL: http://arxiv.org/abs/2104.06083v1
- Date: Tue, 13 Apr 2021 10:38:32 GMT
- Title: Spatiotemporal Entropy Model is All You Need for Learned Video
Compression
- Authors: Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen
Qian, Hao Li
- Abstract summary: We propose a framework to compress raw-pixel frames (rather than residual images)
An entropy model is used to estimate thetemporal redundancy in a latent space rather than pixel level.
Experiments showed that the proposed method outperforms state-of-the-art (SOTA) performance.
- Score: 9.227865598115024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The framework of dominant learned video compression methods is usually
composed of motion prediction modules as well as motion vector and residual
image compression modules, suffering from its complex structure and error
propagation problem. Approaches have been proposed to reduce the complexity by
replacing motion prediction modules with implicit flow networks. Error
propagation aware training strategy is also proposed to alleviate incremental
reconstruction errors from previously decoded frames. Although these methods
have brought some improvement, little attention has been paid to the framework
itself. Inspired by the success of learned image compression through
simplifying the framework with a single deep neural network, it is natural to
expect a better performance in video compression via a simple yet appropriate
framework. Therefore, we propose a framework to directly compress raw-pixel
frames (rather than residual images), where no extra motion prediction module
is required. Instead, an entropy model is used to estimate the spatiotemporal
redundancy in a latent space rather than pixel level, which significantly
reduces the complexity of the framework. Specifically, the whole framework is a
compression module, consisting of a unified auto-encoder which produces
identically distributed latents for all frames, and a spatiotemporal entropy
estimation model to minimize the entropy of these latents. Experiments showed
that the proposed method outperforms state-of-the-art (SOTA) performance under
the metric of multiscale structural similarity (MS-SSIM) and achieves
competitive results under the metric of PSNR.
Related papers
- MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework.
MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions.
Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame
Block Matching [35.80653765524654]
3D dynamic point cloud (DPC) compression relies on mining its temporal context.
This paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module.
arXiv Detail & Related papers (2023-05-09T11:44:13Z) - Entroformer: A Transformer-based Entropy Model for Learned Image
Compression [17.51693464943102]
We propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation.
The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
arXiv Detail & Related papers (2022-02-11T08:03:31Z) - Causal Contextual Prediction for Learned Image Compression [36.08393281509613]
We propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space.
A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts.
We also propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points.
arXiv Detail & Related papers (2020-11-19T08:15:10Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z) - Blurry Video Frame Interpolation [57.77512131536132]
We propose a blurry video frame method to reduce blur motion and up-convert frame rate simultaneously.
Specifically, we develop a pyramid module to cyclically synthesize clear intermediate frames.
Our method performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-02-27T17:00:26Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.