Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation
- URL: http://arxiv.org/abs/2205.03587v1
- Date: Sat, 7 May 2022 08:01:32 GMT
- Title: Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation
- Authors: Tiesong Zhao, Yuhang Huang, Weize Feng, Yiwen Xu, Sam Kwong
- Abstract summary: We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
- Score: 57.66773945887832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ever-growing multimedia traffic has underscored the importance of
effective multimedia codecs. Among them, the up-to-date lossy video coding
standard, Versatile Video Coding (VVC), has been attracting attentions of video
coding community. However, the gain of VVC is achieved at the cost of
significant encoding complexity, which brings the need to realize fast encoder
with comparable Rate Distortion (RD) performance. In this paper, we propose to
optimize the VVC complexity at intra-frame prediction, with a two-stage
framework of deep feature fusion and probability estimation. At the first
stage, we employ the deep convolutional network to extract the spatialtemporal
neighboring coding features. Then we fuse all reference features obtained by
different convolutional kernels to determine an optimal intra coding depth. At
the second stage, we employ a probability-based model and the spatial-temporal
coherence to select the candidate partition modes within the optimal coding
depth. Finally, these selected depths and partitions are executed whilst
unnecessary computations are excluded. Experimental results on standard
database demonstrate the superiority of proposed method, especially for High
Definition (HD) and Ultra-HD (UHD) video sequences.
Related papers
- Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.
First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.
Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.
Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - MMVC: Learned Multi-Mode Video Compression with Block-based Prediction
Mode Selection and Density-Adaptive Entropy Coding [21.147001610347832]
We propose a multi-mode video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns.
For entropy coding, we consider both dense and sparse post-quantization residual blocks, and apply optional run-length coding to sparse residuals to improve the compression rate.
Compared with state-of-the-art video compression schemes and standard codecs, our method yields better or competitive results measured with PSNR and MS-SSIM.
arXiv Detail & Related papers (2023-04-05T07:37:48Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional
Video Compression [10.885590093103344]
Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously.
This paper proposes a learned hierarchical bi-directional video (LHBDC) that combines the benefits of hierarchical motion-sampling and end-to-end optimization.
arXiv Detail & Related papers (2021-12-17T14:30:22Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.