Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model
- URL: http://arxiv.org/abs/2007.04574v1
- Date: Thu, 9 Jul 2020 06:15:17 GMT
- Title: Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model
- Authors: Haojie Liu, Ming Lu, Zhan Ma, Fan Wang, Zhihuang Xie, Xun Cao, Yao
Wang
- Abstract summary: We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
- Score: 45.46660511313426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past two decades, traditional block-based video coding has made
remarkable progress and spawned a series of well-known standards such as
MPEG-4, H.264/AVC and H.265/HEVC. On the other hand, deep neural networks
(DNNs) have shown their powerful capacity for visual content understanding,
feature extraction and compact representation. Some previous works have
explored the learnt video coding algorithms in an end-to-end manner, which show
the great potential compared with traditional methods. In this paper, we
propose an end-to-end deep neural video coding framework (NVC), which uses
variational autoencoders (VAEs) with joint spatial and temporal prior
aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame
motions and inter-frame compensation residuals, respectively. Novel features of
NVC include: 1) To estimate and compensate motion over a large range of
magnitudes, we propose an unsupervised multiscale motion compensation network
(MS-MCN) together with a pyramid decoder in the VAE for coding motion features
that generates multiscale flow fields, 2) we design a novel adaptive
spatiotemporal context model for efficient entropy coding for motion
information, 3) we adopt nonlocal attention modules (NLAM) at the bottlenecks
of the VAEs for implicit adaptive feature extraction and activation, leveraging
its high transformation capacity and unequal weighting with joint global and
local information, and 4) we introduce multi-module optimization and a
multi-frame training strategy to minimize the temporal error propagation among
P-frames. NVC is evaluated for the low-delay causal settings and compared with
H.265/HEVC, H.264/AVC and the other learnt video compression methods following
the common test conditions, demonstrating consistent gains across all popular
test sequences for both PSNR and MS-SSIM distortion metrics.
Related papers
- Motion Free B-frame Coding for Neural Video Compression [0.0]
In this paper, we propose a novel approach that handles the drawbacks of the two typical above-mentioned architectures.
The advantages of the motion-free approach are twofold: it improves the coding efficiency of the network and significantly reduces computational complexity.
Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset.
arXiv Detail & Related papers (2024-11-26T07:03:11Z) - PNVC: Towards Practical INR-based Video Compression [14.088444622391501]
We propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions.
PNVC achieves nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs.
arXiv Detail & Related papers (2024-09-02T05:31:11Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.