Transform Network Architectures for Deep Learning based End-to-End
Image/Video Coding in Subsampled Color Spaces
- URL: http://arxiv.org/abs/2103.01760v1
- Date: Sat, 27 Feb 2021 06:47:27 GMT
- Title: Transform Network Architectures for Deep Learning based End-to-End
Image/Video Coding in Subsampled Color Spaces
- Authors: Hilmi E. Egilmez, Ankitesh K. Singh, Muhammed Coban, Marta Karczewicz,
Yinhao Zhu, Yang Yang, Amir Said, Taco S. Cohen
- Abstract summary: This paper investigates various DLEC designs to support YUV 4:2:0 format.
A new transform network architecture is proposed to improve the efficiency of coding YUV 4:2:0 data.
- Score: 16.83399026040147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the existing deep learning based end-to-end image/video coding (DLEC)
architectures are designed for non-subsampled RGB color format. However, in
order to achieve a superior coding performance, many state-of-the-art
block-based compression standards such as High Efficiency Video Coding
(HEVC/H.265) and Versatile Video Coding (VVC/H.266) are designed primarily for
YUV 4:2:0 format, where U and V components are subsampled by considering the
human visual system. This paper investigates various DLEC designs to support
YUV 4:2:0 format by comparing their performance against the main profiles of
HEVC and VVC standards under a common evaluation framework. Moreover, a new
transform network architecture is proposed to improve the efficiency of coding
YUV 4:2:0 data. The experimental results on YUV 4:2:0 datasets show that the
proposed architecture significantly outperforms naive extensions of existing
architectures designed for RGB format and achieves about 10% average BD-rate
improvement over the intra-frame coding in HEVC.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation [51.26219245226384]
Modern displays are capable of rendering video content with high dynamic range (WCG) and wide color gamut (SDR)
The majority of available resources are still in standard dynamic range (SDR)
We define and analyze the SDRTV-to-TV task by modeling the formation of SDRTV/TV content.
Our method is primarily designed for ultra-high-definition TV content and is therefore effective and lightweight for processing 4K resolution images.
arXiv Detail & Related papers (2023-09-08T02:50:54Z) - Learned Hierarchical B-frame Coding with Adaptive Feature Modulation for
YUV 4:2:0 Content [13.289507865388863]
This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023.
We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model.
arXiv Detail & Related papers (2022-12-29T06:22:52Z) - Learned Video Compression for YUV 4:2:0 Content Using Flow-based
Conditional Inter-frame Coding [24.031385522441497]
This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content.
We introduce a conditional flow-based inter-frame coder to improve the interframe coding efficiency.
Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets.
arXiv Detail & Related papers (2022-10-15T08:36:01Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - A Combined Deep Learning based End-to-End Video Coding Architecture for
YUV Color Space [14.685161934404123]
Most of the existing deep learning based end-to-end video coding (DLEC) architectures are designed specifically for RGB color format.
This paper introduces a new DLEC architecture for video coding to effectively support YUV 4:2:0 and compares its performance against the HEVC standard.
arXiv Detail & Related papers (2021-04-01T23:41:06Z) - Video Compression with CNN-based Post Processing [18.145942926665164]
We propose a new CNN-based post-processing approach, which has been integrated with two state-of-the-art coding standards, VVC and AV1.
Results show consistent coding gains on all tested sequences at various spatial resolutions, with average bit rate savings of 4.0% and 5.8% against original VVC and AV1 respectively.
arXiv Detail & Related papers (2020-09-16T10:07:32Z) - BVI-DVC: A Training Database for Deep Video Compression [13.730093064777078]
BVI-DVC is presented for training CNN-based video compression systems.
It contains 800 sequences at various spatial resolutions from 270p to 2160p.
It has been evaluated on ten existing network architectures for four different coding tools.
arXiv Detail & Related papers (2020-03-30T15:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.