Deep Video Coding with Dual-Path Generative Adversarial Network
- URL: http://arxiv.org/abs/2111.14474v1
- Date: Mon, 29 Nov 2021 11:39:28 GMT
- Title: Deep Video Coding with Dual-Path Generative Adversarial Network
- Authors: Tiesong Zhao, Weize Feng, Hongji Zeng, Yuzhen Niu, Jiaying Liu
- Abstract summary: This paper proposes an efficient codecs namely dual-path generative adversarial network-based video (DGVC)
Our DGVC reduces the average bit-per-pixel (bpp) by 39.39%/54.92% at the same PSNR/MS-SSIM.
- Score: 39.19042551896408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep-learning-based video coding has attracted substantial attention for
its great potential to squeeze out the spatial-temporal redundancies of video
sequences. This paper proposes an efficient codec namely dual-path generative
adversarial network-based video codec (DGVC). First, we propose a dual-path
enhancement with generative adversarial network (DPEG) to reconstruct the
compressed video details. The DPEG consists of an $\alpha$-path of auto-encoder
and convolutional long short-term memory (ConvLSTM), which facilitates the
structure feature reconstruction with a large receptive field and multi-frame
references, and a $\beta$-path of residual attention blocks, which facilitates
the reconstruction of local texture features. Both paths are fused and
co-trained by a generative-adversarial process. Second, we reuse the DPEG
network in both motion compensation and quality enhancement modules, which are
further combined with motion estimation and entropy coding modules in our DGVC
framework. Third, we employ a joint training of deep video compression and
enhancement to further improve the rate-distortion (RD) performance. Compared
with x265 LDP very fast mode, our DGVC reduces the average bit-per-pixel (bpp)
by 39.39%/54.92% at the same PSNR/MS-SSIM, which outperforms the state-of-the
art deep video codecs by a considerable margin.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression [42.92442233544842]
Video compression relies heavily on temporal redundancy.
NVC frameworks are generally more complex, with many large components that are not easy to update quickly during encoding.
We introduce a parameter-efficient delta-tuning strategy, which is achieved by integrating several light-weight adapters into each coding component of the encoding process.
arXiv Detail & Related papers (2024-05-07T12:42:23Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation [14.088444622391501]
Implicit Representations (INRs) have previously been used to represent and compress image and video content.
Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression.
We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
arXiv Detail & Related papers (2023-06-16T12:59:52Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement [15.431248645312309]
We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN)
The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM)
The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions.
arXiv Detail & Related papers (2020-11-18T10:24:38Z) - Learning to Compress Videos without Computing Motion [39.46212197928986]
We propose a new deep learning video compression architecture that does not require motion estimation.
Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations.
Our experiments show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion.
arXiv Detail & Related papers (2020-09-29T15:49:25Z) - Learning for Video Compression with Hierarchical Quality and Recurrent
Enhancement [164.7489982837475]
We propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network.
In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides.
arXiv Detail & Related papers (2020-03-04T09:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.