BVI-DVC: A Training Database for Deep Video Compression
- URL: http://arxiv.org/abs/2003.13552v2
- Date: Thu, 8 Oct 2020 10:24:30 GMT
- Title: BVI-DVC: A Training Database for Deep Video Compression
- Authors: Di Ma, Fan Zhang, and David R. Bull
- Abstract summary: BVI-DVC is presented for training CNN-based video compression systems.
It contains 800 sequences at various spatial resolutions from 270p to 2160p.
It has been evaluated on ten existing network architectures for four different coding tools.
- Score: 13.730093064777078
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning methods are increasingly being applied in the optimisation of
video compression algorithms and can achieve significantly enhanced coding
gains, compared to conventional approaches. Such approaches often employ
Convolutional Neural Networks (CNNs) which are trained on databases with
relatively limited content coverage. In this paper, a new extensive and
representative video database, BVI-DVC, is presented for training CNN-based
video compression systems, with specific emphasis on machine learning tools
that enhance conventional coding architectures, including spatial resolution
and bit depth up-sampling, post-processing and in-loop filtering. BVI-DVC
contains 800 sequences at various spatial resolutions from 270p to 2160p and
has been evaluated on ten existing network architectures for four different
coding tools. Experimental results show that this database produces significant
improvements in terms of coding gains over three existing (commonly used)
image/video training databases under the same training and evaluation
configurations. The overall additional coding improvements by using the
proposed database for all tested coding modules and CNN architectures are up to
10.3% based on the assessment of PSNR and 8.1% based on VMAF.
Related papers
- Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
A new paradigm is urgently needed for a more "conscious" process of quality enhancement.
We propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression domain.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - HiNeRV: Video Compression with Hierarchical Encoding-based Neural
Representation [14.088444622391501]
Implicit Representations (INRs) have previously been used to represent and compress image and video content.
Existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression.
We propose HiNeRV, an INR that combines light weight layers with hierarchical positional encodings.
arXiv Detail & Related papers (2023-06-16T12:59:52Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Graph Neural Networks for Channel Decoding [71.15576353630667]
We showcase competitive decoding performance for various coding schemes, such as low-density parity-check (LDPC) and BCH codes.
The idea is to let a neural network (NN) learn a generalized message passing algorithm over a given graph.
We benchmark our proposed decoder against state-of-the-art in conventional channel decoding as well as against recent deep learning-based results.
arXiv Detail & Related papers (2022-07-29T15:29:18Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Multitask Learning for VVC Quality Enhancement and Super-Resolution [11.446576112498596]
We propose a learning-based solution as a post-processing step to enhance the decoded VVC video quality.
Our method relies on multitask learning to perform both quality enhancement and super-resolution using a single shared network optimized for multiple levels.
arXiv Detail & Related papers (2021-04-16T19:05:26Z) - Multi-Density Attention Network for Loop Filtering in Video Compression [9.322800480045336]
We propose a on-line scaling based multi-density attention network for loop filtering in video compression.
Experimental results show that 10.18% bit-rate reduction at the same video quality can be achieved over the latest Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-04-08T05:46:38Z) - CVEGAN: A Perceptually-inspired GAN for Compressed Video Enhancement [15.431248645312309]
We propose a new Generative Adversarial Network for Compressed Video quality Enhancement (CVEGAN)
The CVEGAN generator benefits from the use of a novel Mul2Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM)
The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions.
arXiv Detail & Related papers (2020-11-18T10:24:38Z) - Multiresolution Convolutional Autoencoders [5.0169726108025445]
We propose a multi-resolution convolutional autoencoder architecture that integrates and leverages three successful mathematical architectures.
Basic learning techniques are applied to ensure information learned from previous training steps can be rapidly transferred to the larger network.
The performance gains are illustrated through a sequence of numerical experiments on synthetic examples and real-world spatial data.
arXiv Detail & Related papers (2020-04-10T08:31:59Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.