Learning Cross-Scale Prediction for Efficient Neural Video Compression
- URL: http://arxiv.org/abs/2112.13309v1
- Date: Sun, 26 Dec 2021 03:12:17 GMT
- Title: Learning Cross-Scale Prediction for Efficient Neural Video Compression
- Authors: Zongyu Guo, Runsen Feng, Zhizheng Zhang, Xin Jin, Zhibo Chen
- Abstract summary: We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode.
We propose a novel cross-scale prediction module that achieves more effective motion compensation.
- Score: 30.051859347293856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present the first neural video codec that can compete with
the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for
the low-latency mode. Existing neural hybrid video coding approaches rely on
optical flow or Gaussian-scale flow for prediction, which cannot support
fine-grained adaptation to diverse motion content. Towards more
content-adaptive prediction, we propose a novel cross-scale prediction module
that achieves more effective motion compensation. Specifically, on the one
hand, we produce a reference feature pyramid as prediction sources, then
transmit cross-scale flows that leverage the feature scale to control the
precision of prediction. On the other hand, we introduce the mechanism of
weighted prediction into the scenario of prediction with a single reference
frame, where cross-scale weight maps are transmitted to synthesize a fine
prediction result. In addition to the cross-scale prediction module, we further
propose a multi-stage quantization strategy, which improves the rate-distortion
performance with no extra computational penalty during inference. We show the
encouraging performance of our efficient neural video codec (ENVC) on several
common benchmark datasets and analyze in detail the effectiveness of every
important component.
Related papers
- Multi-Scale Feature Prediction with Auxiliary-Info for Neural Image Compression [13.076563599765176]
We introduce a new predictive structure consisting of the auxiliary coarse network and the main network, inspired by neural video compression.
Our model outperforms other neural image compression models and achieves a 19.49% higher rate-distortion performance than VVC on Tecnick dataset.
arXiv Detail & Related papers (2024-09-19T12:41:53Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for
Video Prediction [1.2537993038844142]
We present a multi-scale predictive coding model for future video frames prediction.
Our model employs a multi-scale approach (Coarse to Fine) where the higher level neurons generate coarser predictions (lower resolution)
We propose several improvements to the training strategy to mitigate the accumulation of prediction errors in long-term prediction.
arXiv Detail & Related papers (2022-12-22T12:15:37Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Neural Network based Inter bi-prediction Blending [8.815673539598816]
This paper presents a learning-based method to improve bi-prediction in video coding.
In this context, we introduce a simple neural network that further improves the blending operation.
Tests are performed and show a BD-rate improvement of -1.4% in random access configuration for a network size of fewer than 10k parameters.
arXiv Detail & Related papers (2022-01-26T13:57:48Z) - Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression [50.48504867843605]
We propose a search-free block motion estimation framework using a multi-stage convolutional neural network.
We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
arXiv Detail & Related papers (2021-10-05T03:38:43Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Chroma Intra Prediction with attention-based CNN architectures [15.50693711359313]
This paper proposes a new neural network architecture for cross-component intra-prediction.
The network uses a novel attention module to model spatial relations between reference and predicted samples.
arXiv Detail & Related papers (2020-06-27T12:11:17Z) - Deep Learning for Content-based Personalized Viewport Prediction of
360-Degree VR Videos [72.08072170033054]
In this paper, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement.
For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model.
arXiv Detail & Related papers (2020-03-01T07:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.