Saliency-Driven Versatile Video Coding for Neural Object Detection
- URL: http://arxiv.org/abs/2203.05944v1
- Date: Fri, 11 Mar 2022 14:27:43 GMT
- Title: Saliency-Driven Versatile Video Coding for Neural Object Detection
- Authors: Kristian Fischer, Felix Fleckenstein, Christian Herglotz, Andr\'e Kaup
- Abstract summary: We propose a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC)
To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once(YOLO) in combination with a novel decision criterion.
We find that, compared to the reference VVC with a constant quality, up to 29 % of accuracy can be saved with the same detection at the decoder side by applying the proposed saliency-driven framework.
- Score: 7.367608892486084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Saliency-driven image and video coding for humans has gained importance in
the recent past. In this paper, we propose such a saliency-driven coding
framework for the video coding for machines task using the latest video coding
standard Versatile Video Coding (VVC). To determine the salient regions before
encoding, we employ the real-time-capable object detection network You Only
Look Once~(YOLO) in combination with a novel decision criterion. To measure the
coding quality for a machine, the state-of-the-art object segmentation network
Mask R-CNN was applied to the decoded frame. From extensive simulations we find
that, compared to the reference VVC with a constant quality, up to 29 % of
bitrate can be saved with the same detection accuracy at the decoder side by
applying the proposed saliency-driven framework. Besides, we compare YOLO
against other, more traditional saliency detection methods.
Related papers
- VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - VVC Extension Scheme for Object Detection Using Contrast Reduction [0.0]
We propose an extention scheme of video coding for object detection using Versatile Video Coding (VVC)
In our proposed scheme, the original image is reduced in size and contrast, then coded with VVC encoder to achieve high compression performance.
Experimental results show that the proposed video coding scheme achieves better coding performance than regular VVC in terms of object detection accuracy.
arXiv Detail & Related papers (2023-05-30T06:29:04Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Accuracy Improvement of Object Detection in VVC Coded Video Using
YOLO-v7 Features [0.0]
In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls.
We propose a neural-network-based approach to improve image recognition accuracy by applying post-processing to the encoded video.
We show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.
arXiv Detail & Related papers (2023-04-03T02:38:54Z) - Scalable Video Coding for Humans and Machines [42.870358996305356]
We propose a scalable video coding framework that supports machine vision through its base layer bitstream and human vision via its enhancement layer bitstream.
The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding.
arXiv Detail & Related papers (2022-08-04T07:45:41Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Human-Machine Collaborative Video Coding Through Cuboidal Partitioning [26.70051123157869]
We propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids.
Cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric.
Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection.
arXiv Detail & Related papers (2021-02-02T04:44:45Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.