Learned Video Codec with Enriched Reconstruction for CLIC P-frame Coding
- URL: http://arxiv.org/abs/2012.07462v1
- Date: Mon, 14 Dec 2020 12:32:46 GMT
- Title: Learned Video Codec with Enriched Reconstruction for CLIC P-frame Coding
- Authors: David Alexandre and Hsueh-Ming Hang
- Abstract summary: This paper proposes a learning-based video, specifically used for Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame coding.
More specifically, we designed a compressor network with Refine-Net for residual coding signals and motion vectors.
Our video demonstrates its performance by using the perfect reference frame at the decoder side specified by the CLIC P-frame Challenge.
- Score: 11.000499414131324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a learning-based video codec, specifically used for
Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame
coding. More specifically, we designed a compressor network with Refine-Net for
coding residual signals and motion vectors. Also, for motion estimation, we
introduced a hierarchical, attention-based ME-Net. To verify our design, we
conducted an extensive ablation study on our modules and different input
formats. Our video codec demonstrates its performance by using the perfect
reference frame at the decoder side specified by the CLIC P-frame Challenge.
The experimental result shows that our proposed codec is very competitive with
the Challenge top performers in terms of quality metrics.
Related papers
- Learned Compression for Images and Point Clouds [1.7404865362620803]
This thesis provides three primary contributions to this new field of learned compression.
First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information.
Secondly, we propose a novel lightweight low-complexity point cloud that is highly specialized for classification, attaining significant reductions in compared to non-specialized codecs.
arXiv Detail & Related papers (2024-09-12T19:57:44Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Saliency-Driven Versatile Video Coding for Neural Object Detection [7.367608892486084]
We propose a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC)
To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once(YOLO) in combination with a novel decision criterion.
We find that, compared to the reference VVC with a constant quality, up to 29 % of accuracy can be saved with the same detection at the decoder side by applying the proposed saliency-driven framework.
arXiv Detail & Related papers (2022-03-11T14:27:43Z) - Adaptation and Attention for Neural Video Coding [23.116987835862314]
We propose an end-to-end learned video that introduces several architectural novelties as well as training novelties.
As one architectural novelty, we propose to train the inter-frame model to adapt the motion estimation process based on the resolution of the input video.
A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets.
arXiv Detail & Related papers (2021-12-16T10:25:49Z) - Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network.
PLVC learns to compress video towards good perceptual quality at low bit-rate.
The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z) - Conditional Coding and Variable Bitrate for Practical Learned Video
Coding [1.6619384554007748]
Conditional coding and quantization gain vectors are used to provide flexibility to a single encoder/decoder pair.
The proposed approach shows performance on par with HEVC.
arXiv Detail & Related papers (2021-04-19T07:48:55Z) - Learning to Compress Videos without Computing Motion [39.46212197928986]
We propose a new deep learning video compression architecture that does not require motion estimation.
Our framework exploits the regularities inherent to video motion, which we capture by using displaced frame differences as video representations.
Our experiments show that our compression model, which we call the MOtionless VIdeo Codec (MOVI-Codec), learns how to efficiently compress videos without computing motion.
arXiv Detail & Related papers (2020-09-29T15:49:25Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.