Neural Video Compression with Temporal Layer-Adaptive Hierarchical
B-frame Coding
- URL: http://arxiv.org/abs/2308.15791v3
- Date: Tue, 5 Sep 2023 05:17:42 GMT
- Title: Neural Video Compression with Temporal Layer-Adaptive Hierarchical
B-frame Coding
- Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui
Yong Kim
- Abstract summary: We propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization.
The model achieves an impressive BD-rate gain of -39.86% against the baseline.
It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension.
- Score: 5.8550373172233305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural video compression (NVC) is a rapidly evolving video coding research
area, with some models achieving superior coding efficiency compared to the
latest video coding standard Versatile Video Coding (VVC). In conventional
video coding standards, the hierarchical B-frame coding, which utilizes a
bidirectional prediction structure for higher compression, had been
well-studied and exploited. In NVC, however, limited research has investigated
the hierarchical B scheme. In this paper, we propose an NVC model exploiting
hierarchical B-frame coding with temporal layer-adaptive optimization. We first
extend an existing unidirectional NVC model to a bidirectional model, which
achieves -21.13% BD-rate gain over the unidirectional baseline model. However,
this model faces challenges when applied to sequences with complex or large
motions, leading to performance degradation. To address this, we introduce
temporal layer-adaptive optimization, incorporating methods such as temporal
layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent
scaling (TALS). The final model with the proposed methods achieves an
impressive BD-rate gain of -39.86% against the baseline. It also resolves the
challenges in sequences with large or complex motions with up to -49.13% more
BD-rate gains than the simple bidirectional extension. This improvement is
attributed to the allocation of more bits to lower temporal layers, thereby
enhancing overall reconstruction quality with smaller bits. Since our method
has little dependency on a specific NVC model architecture, it can serve as a
general tool for extending unidirectional NVC models to the ones with
hierarchical B-frame coding.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.
First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.
Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.
Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z) - Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences.
Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy.
Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional
Video Compression [10.885590093103344]
Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously.
This paper proposes a learned hierarchical bi-directional video (LHBDC) that combines the benefits of hierarchical motion-sampling and end-to-end optimization.
arXiv Detail & Related papers (2021-12-17T14:30:22Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.