Task Oriented Video Coding: A Survey
- URL: http://arxiv.org/abs/2208.07313v1
- Date: Mon, 15 Aug 2022 16:21:54 GMT
- Title: Task Oriented Video Coding: A Survey
- Authors: Daniel Wood
- Abstract summary: State-of-the-art video coding standards, such as H.265/HEVC and Versatile Video Coding, are still designed with the assumption the compressed video will be watched by humans.
With the tremendous advance and maturation of deep neural networks in solving computer vision tasks, more and more videos are directly analyzed by deep neural networks without humans' involvement.
We explore and summarize recent progress on computer vision task oriented video coding and emerging video coding standard, Video Coding for Machines.
- Score: 0.5076419064097732
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video coding technology has been continuously improved for higher compression
ratio with higher resolution. However, the state-of-the-art video coding
standards, such as H.265/HEVC and Versatile Video Coding, are still designed
with the assumption the compressed video will be watched by humans. With the
tremendous advance and maturation of deep neural networks in solving computer
vision tasks, more and more videos are directly analyzed by deep neural
networks without humans' involvement. Such a conventional design for video
coding standard is not optimal when the compressed video is used by computer
vision applications. While the human visual system is consistently sensitive to
the content with high contrast, the impact of pixels on computer vision
algorithms is driven by specific computer vision tasks. In this paper, we
explore and summarize recent progress on computer vision task oriented video
coding and emerging video coding standard, Video Coding for Machines.
Related papers
- One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing [13.74209129258984]
We propose a new approach to upgrade a 2D video to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair.
We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos.
Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view.
arXiv Detail & Related papers (2024-04-15T17:56:05Z) - NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z) - Learned Scalable Video Coding For Humans and Machines [4.14360329494344]
We introduce an end-to-end learnable video task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing.
Our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.
arXiv Detail & Related papers (2023-07-18T05:22:25Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - VVC Extension Scheme for Object Detection Using Contrast Reduction [0.0]
We propose an extention scheme of video coding for object detection using Versatile Video Coding (VVC)
In our proposed scheme, the original image is reduced in size and contrast, then coded with VVC encoder to achieve high compression performance.
Experimental results show that the proposed video coding scheme achieves better coding performance than regular VVC in terms of object detection accuracy.
arXiv Detail & Related papers (2023-05-30T06:29:04Z) - A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision [93.90545426665999]
We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision.
A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well.
It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.
arXiv Detail & Related papers (2023-03-30T13:42:58Z) - Scalable Video Coding for Humans and Machines [42.870358996305356]
We propose a scalable video coding framework that supports machine vision through its base layer bitstream and human vision via its enhancement layer bitstream.
The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding.
arXiv Detail & Related papers (2022-08-04T07:45:41Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.