CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement
- URL: http://arxiv.org/abs/2011.09190v2
- Date: Thu, 26 Nov 2020 20:17:53 GMT
- Title: CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement
- Authors: Di Ma, Fan Zhang and David R. Bull
- Abstract summary: We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN)
The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM)
The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions.
- Score: 15.431248645312309
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a new Generative Adversarial Network for Compressed Video quality
Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel
Mul2Res block (with multiple levels of residual learning branches), an enhanced
residual non-local block (ERNB) and an enhanced convolutional block attention
module (ECBAM). The ERNB has also been employed in the discriminator to improve
the representational capability. The training strategy has also been
re-designed specifically for video compression applications, to employ a
relativistic sphere GAN (ReSphereGAN) training methodology together with new
perceptual loss functions. The proposed network has been fully evaluated in the
context of two typical video compression enhancement tools: post-processing
(PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated
into the MPEG HEVC video coding test model (HM16.20) and experimental results
demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared
to the anchor) over existing state-of-the-art architectures for both coding
tools across multiple datasets.
Related papers
- Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing.
However, it comes at the cost of visual quality, challenges the robustness of downstream vision models.
We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z) - H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models [76.1519545010611]
Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation.
In this work, we examine the architecture design choices and optimize the computation distribution to obtain efficient and high-compression video AEs.
Our AE achieves an ultra-high compression ratio and real-time decoding speed on mobile while outperforming prior art in terms of reconstruction metrics.
arXiv Detail & Related papers (2025-04-14T17:59:06Z) - GIViC: Generative Implicit Video Compression [11.908506692749743]
Implicit Video Compression ( GIViC) is inspired by the characteristics that INRs share with large language diffusion models in exploiting long-term dependencies.
A novel Gene Gated Linear Attention-based transformer (HGLA) is also integrated into the framework, which dual-factorizes global dependency modeling.
As far as we are aware GIViC is the first INR-based video that outperforms VTM coding configuration.
arXiv Detail & Related papers (2025-03-25T12:39:45Z) - REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder [52.698595889988766]
We present a novel perspective on learning video embedders for generative modeling.
Rather than requiring an exact reproduction of an input video, an effective embedder should focus on visually plausible reconstructions.
We propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework.
arXiv Detail & Related papers (2025-03-11T17:51:07Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.
We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.
Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.
First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.
Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.
Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Channel-wise Feature Decorrelation for Enhanced Learned Image Compression [16.638869231028437]
The emerging Learned Compression (LC) replaces the traditional modules with Deep Neural Networks (DNN), which are trained end-to-end for rate-distortion performance.
This paper proposes to improve compression by fully exploiting the existing DNN capacity.
Three strategies are proposed and evaluated, which optimize (1) the transformation network, (2) the context model, and (3) both networks.
arXiv Detail & Related papers (2024-03-16T14:30:25Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Structured Sparsity Learning for Efficient Video Super-Resolution [99.1632164448236]
We develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of video super-resolution (VSR) models.
In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
arXiv Detail & Related papers (2022-06-15T17:36:04Z) - Deep Video Coding with Dual-Path Generative Adversarial Network [39.19042551896408]
This paper proposes an efficient codecs namely dual-path generative adversarial network-based video (DGVC)
Our DGVC reduces the average bit-per-pixel (bpp) by 39.39%/54.92% at the same PSNR/MS-SSIM.
arXiv Detail & Related papers (2021-11-29T11:39:28Z) - BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation
and Alignment [90.81396836308085]
We show that by empowering recurrent framework with enhanced propagation and alignment, one can exploit video information more effectively.
Our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters.
BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement.
arXiv Detail & Related papers (2021-04-27T17:58:31Z) - Learning the Loss Functions in a Discriminative Space for Video
Restoration [48.104095018697556]
We propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration task.
Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network.
Experiments on video superresolution and deblurring show that our method generates visually more pleasing videos.
arXiv Detail & Related papers (2020-03-20T06:58:27Z) - Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression [20.504561050200365]
We propose the first learned multi-frequency image compression and entropy coding approach.
It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components.
We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
arXiv Detail & Related papers (2020-02-24T01:35:29Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.