Adaptation and Attention for Neural Video Coding
- URL: http://arxiv.org/abs/2112.08767v1
- Date: Thu, 16 Dec 2021 10:25:49 GMT
- Title: Adaptation and Attention for Neural Video Coding
- Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed
R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu
- Abstract summary: We propose an end-to-end learned video that introduces several architectural novelties as well as training novelties.
As one architectural novelty, we propose to train the inter-frame model to adapt the motion estimation process based on the resolution of the input video.
A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets.
- Score: 23.116987835862314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural image coding represents now the state-of-the-art image compression
approach. However, a lot of work is still to be done in the video domain. In
this work, we propose an end-to-end learned video codec that introduces several
architectural novelties as well as training novelties, revolving around the
concepts of adaptation and attention. Our codec is organized as an intra-frame
codec paired with an inter-frame codec. As one architectural novelty, we
propose to train the inter-frame codec model to adapt the motion estimation
process based on the resolution of the input video. A second architectural
novelty is a new neural block that combines concepts from split-attention based
neural networks and from DenseNets. Finally, we propose to overfit a set of
decoder-side multiplicative parameters at inference time. Through ablation
studies and comparisons to prior art, we show the benefits of our proposed
techniques in terms of coding gains. We compare our codec to VVC/H.266 and
RLVC, which represent the state-of-the-art traditional and end-to-end learned
codecs, respectively, and to the top performing end-to-end learned approach in
2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and
compare favorably to VVC and RLVC in some settings.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z) - RQAT-INR: Improved Implicit Neural Image Compression [4.449835214520727]
In this research, we show that INR based image has a lower complexity than VAE based approaches.
We also propose several improvements for INR-based image and baseline model by a large margin.
arXiv Detail & Related papers (2023-03-06T10:59:45Z) - Neural Video Compression with Diverse Contexts [25.96187914295921]
This paper proposes increasing the context diversity in both temporal and spatial dimensions.
Experiments show that our codes obtains 23.5% saving over previous SOTA NVC.
arXiv Detail & Related papers (2023-02-28T08:35:50Z) - CNeRV: Content-adaptive Neural Representation for Visual Data [54.99373641890767]
We propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation.
We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images)
With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images.
arXiv Detail & Related papers (2022-11-18T18:35:43Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - AIVC: Artificial Intelligence based Video Codec [2.410573852722981]
AIVC is an end-to-end neural video system.
It learns to compress videos using any coding configurations.
It offers performance competitive with the recent video coder HEVC.
arXiv Detail & Related papers (2022-02-09T10:03:12Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Multitask Learning for VVC Quality Enhancement and Super-Resolution [11.446576112498596]
We propose a learning-based solution as a post-processing step to enhance the decoded VVC video quality.
Our method relies on multitask learning to perform both quality enhancement and super-resolution using a single shared network optimized for multiple levels.
arXiv Detail & Related papers (2021-04-16T19:05:26Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.