Learned Hierarchical B-frame Coding with Adaptive Feature Modulation for
YUV 4:2:0 Content
- URL: http://arxiv.org/abs/2212.14187v1
- Date: Thu, 29 Dec 2022 06:22:52 GMT
- Title: Learned Hierarchical B-frame Coding with Adaptive Feature Modulation for
YUV 4:2:0 Content
- Authors: Mu-Jung Chen, Hong-Sheng Xie, Cheng Chien, Wen-Hsiao Peng, Hsueh-Ming
Hang
- Abstract summary: This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023.
We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model.
- Score: 13.289507865388863
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces a learned hierarchical B-frame coding scheme in
response to the Grand Challenge on Neural Network-based Video Coding at ISCAS
2023. We address specifically three issues, including (1) B-frame coding, (2)
YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one
single model. Most learned video codecs operate internally in the RGB domain
for P-frame coding. B-frame coding for YUV 4:2:0 content is largely
under-explored. In addition, while there have been prior works on variable-rate
coding with conditional convolution, most of them fail to consider the content
information. We build our scheme on conditional augmented normalized flows
(CANF). It features conditional motion and inter-frame codecs for efficient
B-frame coding. To cope with YUV 4:2:0 content, two conditional inter-frame
codecs are used to process the Y and UV components separately, with the coding
of the UV components conditioned additionally on the Y component. Moreover, we
introduce adaptive feature modulation in every convolutional layer, taking into
account both the content information and the coding levels of B-frames to
achieve content-adaptive variable-rate coding. Experimental results show that
our model outperforms x265 and the winner of last year's challenge on commonly
used datasets in terms of PSNR-YUV.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Hierarchical B-frame Video Coding Using Two-Layer CANF without Motion
Coding [17.998825368770635]
We propose a novel B-frame coding architecture based on two-layer Augmented Normalization Flows (CANF)
Our proposed idea of video compression without motion coding offers a new direction for learned video coding.
The rate-distortion performance of our scheme is slightly lower than that of the state-of-the-art learned B-frame coding scheme, B-CANF, but outperforms other learned B-frame coding schemes.
arXiv Detail & Related papers (2023-04-05T18:36:28Z) - Learned Video Compression for YUV 4:2:0 Content Using Flow-based
Conditional Inter-frame Coding [24.031385522441497]
This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content.
We introduce a conditional flow-based inter-frame coder to improve the interframe coding efficiency.
Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets.
arXiv Detail & Related papers (2022-10-15T08:36:01Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Graph Neural Networks for Channel Decoding [71.15576353630667]
We showcase competitive decoding performance for various coding schemes, such as low-density parity-check (LDPC) and BCH codes.
The idea is to let a neural network (NN) learn a generalized message passing algorithm over a given graph.
We benchmark our proposed decoder against state-of-the-art in conventional channel decoding as well as against recent deep learning-based results.
arXiv Detail & Related papers (2022-07-29T15:29:18Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - A Combined Deep Learning based End-to-End Video Coding Architecture for
YUV Color Space [14.685161934404123]
Most of the existing deep learning based end-to-end video coding (DLEC) architectures are designed specifically for RGB color format.
This paper introduces a new DLEC architecture for video coding to effectively support YUV 4:2:0 and compares its performance against the HEVC standard.
arXiv Detail & Related papers (2021-04-01T23:41:06Z) - Transform Network Architectures for Deep Learning based End-to-End
Image/Video Coding in Subsampled Color Spaces [16.83399026040147]
This paper investigates various DLEC designs to support YUV 4:2:0 format.
A new transform network architecture is proposed to improve the efficiency of coding YUV 4:2:0 data.
arXiv Detail & Related papers (2021-02-27T06:47:27Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.