NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines
- URL: http://arxiv.org/abs/2401.10761v1
- Date: Fri, 19 Jan 2024 15:33:46 GMT
- Title: NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines
- Authors: Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco
Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu
- Abstract summary: This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
- Score: 19.183883119933558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent progress in artificial intelligence has led to an ever-increasing
usage of images and videos by machine analysis algorithms, mainly neural
networks. Nonetheless, compression, storage and transmission of media have
traditionally been designed considering human beings as the viewers of the
content. Recent research on image and video coding for machine analysis has
progressed mainly in two almost orthogonal directions. The first is represented
by end-to-end (E2E) learned codecs which, while offering high performance on
image coding, are not yet on par with state-of-the-art conventional video
codecs and lack interoperability. The second direction considers using the
Versatile Video Coding (VVC) standard or any other conventional video codec
(CVC) together with pre- and post-processing operations targeting machine
analysis. While the CVC-based methods benefit from interoperability and broad
hardware and software support, the machine task performance is often lower than
the desired level, particularly in low bitrates. This paper proposes a hybrid
codec for machines called NN-VVC, which combines the advantages of an
E2E-learned image codec and a CVC to achieve high performance in both image and
video coding for machines. Our experiments show that the proposed system
achieved up to -43.20% and -26.8% Bj{\o}ntegaard Delta rate reduction over VVC
for image and video data, respectively, when evaluated on multiple different
datasets and machine vision tasks. To the best of our knowledge, this is the
first research paper showing a hybrid video codec that outperforms VVC on
multiple datasets and multiple machine vision tasks.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Learned Scalable Video Coding For Humans and Machines [4.14360329494344]
We introduce an end-to-end learnable video task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing.
Our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.
arXiv Detail & Related papers (2023-07-18T05:22:25Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - VVC Extension Scheme for Object Detection Using Contrast Reduction [0.0]
We propose an extention scheme of video coding for object detection using Versatile Video Coding (VVC)
In our proposed scheme, the original image is reduced in size and contrast, then coded with VVC encoder to achieve high compression performance.
Experimental results show that the proposed video coding scheme achieves better coding performance than regular VVC in terms of object detection accuracy.
arXiv Detail & Related papers (2023-05-30T06:29:04Z) - Task Oriented Video Coding: A Survey [0.5076419064097732]
State-of-the-art video coding standards, such as H.265/HEVC and Versatile Video Coding, are still designed with the assumption the compressed video will be watched by humans.
With the tremendous advance and maturation of deep neural networks in solving computer vision tasks, more and more videos are directly analyzed by deep neural networks without humans' involvement.
We explore and summarize recent progress on computer vision task oriented video coding and emerging video coding standard, Video Coding for Machines.
arXiv Detail & Related papers (2022-08-15T16:21:54Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Adaptation and Attention for Neural Video Coding [23.116987835862314]
We propose an end-to-end learned video that introduces several architectural novelties as well as training novelties.
As one architectural novelty, we propose to train the inter-frame model to adapt the motion estimation process based on the resolution of the input video.
A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets.
arXiv Detail & Related papers (2021-12-16T10:25:49Z) - Multitask Learning for VVC Quality Enhancement and Super-Resolution [11.446576112498596]
We propose a learning-based solution as a post-processing step to enhance the decoded VVC video quality.
Our method relies on multitask learning to perform both quality enhancement and super-resolution using a single shared network optimized for multiple levels.
arXiv Detail & Related papers (2021-04-16T19:05:26Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.