A Combined Deep Learning based End-to-End Video Coding Architecture for
YUV Color Space
- URL: http://arxiv.org/abs/2104.00807v1
- Date: Thu, 1 Apr 2021 23:41:06 GMT
- Title: A Combined Deep Learning based End-to-End Video Coding Architecture for
YUV Color Space
- Authors: Ankitesh K. Singh, Hilmi E. Egilmez, Reza Pourreza, Muhammed Coban,
Marta Karczewicz, Taco S. Cohen
- Abstract summary: Most of the existing deep learning based end-to-end video coding (DLEC) architectures are designed specifically for RGB color format.
This paper introduces a new DLEC architecture for video coding to effectively support YUV 4:2:0 and compares its performance against the HEVC standard.
- Score: 14.685161934404123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the existing deep learning based end-to-end video coding (DLEC)
architectures are designed specifically for RGB color format, yet the video
coding standards, including H.264/AVC, H.265/HEVC and H.266/VVC developed over
past few decades, have been designed primarily for YUV 4:2:0 format, where the
chrominance (U and V) components are subsampled to achieve superior compression
performances considering the human visual system. While a broad number of
papers on DLEC compare these two distinct coding schemes in RGB domain, it is
ideal to have a common evaluation framework in YUV 4:2:0 domain for a more fair
comparison. This paper introduces a new DLEC architecture for video coding to
effectively support YUV 4:2:0 and compares its performance against the HEVC
standard under a common evaluation framework. The experimental results on YUV
4:2:0 video sequences show that the proposed architecture can outperform HEVC
in intra-frame coding, however inter-frame coding is not as efficient on
contrary to the RGB coding results reported in recent papers.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration [11.016119119250765]
This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration.
To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space.
The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested.
arXiv Detail & Related papers (2024-08-09T12:55:23Z) - Hierarchical B-frame Video Coding for Long Group of Pictures [42.229439873835254]
We present an end-to-end learned video for random access that combines training on long sequences of frames, rate allocation and content adaptation on inference.
Under common test conditions, it achieves results comparable to VTM in terms of YUV-PSNR BD-Rate on some classes of videos.
On average it surpasses open LD and RA end-to-end solutions in terms of VMAF and YUV BD-Rates.
arXiv Detail & Related papers (2024-06-24T11:29:52Z) - Learned Hierarchical B-frame Coding with Adaptive Feature Modulation for
YUV 4:2:0 Content [13.289507865388863]
This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023.
We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model.
arXiv Detail & Related papers (2022-12-29T06:22:52Z) - Learned Video Compression for YUV 4:2:0 Content Using Flow-based
Conditional Inter-frame Coding [24.031385522441497]
This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content.
We introduce a conditional flow-based inter-frame coder to improve the interframe coding efficiency.
Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets.
arXiv Detail & Related papers (2022-10-15T08:36:01Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Video Corpus Moment Retrieval with Contrastive Learning [56.249924768243375]
Video corpus moment retrieval (VCMR) is to retrieve a temporal moment that semantically corresponds to a given text query.
We propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR.
Experimental results show that ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.
arXiv Detail & Related papers (2021-05-13T12:54:39Z) - Transform Network Architectures for Deep Learning based End-to-End
Image/Video Coding in Subsampled Color Spaces [16.83399026040147]
This paper investigates various DLEC designs to support YUV 4:2:0 format.
A new transform network architecture is proposed to improve the efficiency of coding YUV 4:2:0 data.
arXiv Detail & Related papers (2021-02-27T06:47:27Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.