Related papers: Deep Video Coding with Dual-Path Generative Adversarial Network

Deep Video Coding with Dual-Path Generative Adversarial Network

URL: http://arxiv.org/abs/2111.14474v1
Date: Mon, 29 Nov 2021 11:39:28 GMT
Title: Deep Video Coding with Dual-Path Generative Adversarial Network
Authors: Tiesong Zhao, Weize Feng, Hongji Zeng, Yuzhen Niu, Jiaying Liu
Abstract summary: This paper proposes an efficient codecs namely dual-path generative adversarial network-based video (DGVC) Our DGVC reduces the average bit-per-pixel (bpp) by 39.39%/54.92% at the same PSNR/MS-SSIM.
Score: 39.19042551896408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The deep-learning-based video coding has attracted substantial attention for its great potential to squeeze out the spatial-temporal redundancies of video sequences. This paper proposes an efficient codec namely dual-path generative adversarial network-based video codec (DGVC). First, we propose a dual-path enhancement with generative adversarial network (DPEG) to reconstruct the compressed video details. The DPEG consists of an $\alpha$-path of auto-encoder and convolutional long short-term memory (ConvLSTM), which facilitates the structure feature reconstruction with a large receptive field and multi-frame references, and a $\beta$-path of residual attention blocks, which facilitates the reconstruction of local texture features. Both paths are fused and co-trained by a generative-adversarial process. Second, we reuse the DPEG network in both motion compensation and quality enhancement modules, which are further combined with motion estimation and entropy coding modules in our DGVC framework. Third, we employ a joint training of deep video compression and enhancement to further improve the rate-distortion (RD) performance. Compared with x265 LDP very fast mode, our DGVC reduces the average bit-per-pixel (bpp) by 39.39%/54.92% at the same PSNR/MS-SSIM, which outperforms the state-of-the art deep video codecs by a considerable margin.

Related papers

Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, challenges the robustness of downstream vision models. We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z)
GIViC: Generative Implicit Video Compression [11.908506692749743]
Implicit Video Compression ( GIViC) is inspired by the characteristics that INRs share with large language diffusion models in exploiting long-term dependencies. A novel Gene Gated Linear Attention-based transformer (HGLA) is also integrated into the framework, which dual-factorizes global dependency modeling. As far as we are aware GIViC is the first INR-based video that outperforms VTM coding configuration.
arXiv Detail & Related papers (2025-03-25T12:39:45Z)
Motion Free B-frame Coding for Neural Video Compression [0.0]
In this paper, we propose a novel approach that handles the drawbacks of the two typical above-mentioned architectures. The advantages of the motion-free approach are twofold: it improves the coding efficiency of the network and significantly reduces computational complexity. Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset.
arXiv Detail & Related papers (2024-11-26T07:03:11Z)
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding. During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes. Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z)
Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos. Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z)
Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing. This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z)
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z)
HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation [14.088444622391501]
Implicit Representations (INRs) have previously been used to represent and compress image and video content. Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
arXiv Detail & Related papers (2023-06-16T12:59:52Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement [15.431248645312309]
We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN) The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM) The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions.
arXiv Detail & Related papers (2020-11-18T10:24:38Z)
Learning to Compress Videos without Computing Motion [39.46212197928986]
We propose a new deep learning video compression architecture that does not require motion estimation. Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations. Our experiments show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion.
arXiv Detail & Related papers (2020-09-29T15:49:25Z)
Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement [164.7489982837475]
We propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides.
arXiv Detail & Related papers (2020-03-04T09:31:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.