Accuracy Improvement of Object Detection in VVC Coded Video Using
YOLO-v7 Features
- URL: http://arxiv.org/abs/2304.00689v1
- Date: Mon, 3 Apr 2023 02:38:54 GMT
- Title: Accuracy Improvement of Object Detection in VVC Coded Video Using
YOLO-v7 Features
- Authors: Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe
- Abstract summary: In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls.
We propose a neural-network-based approach to improve image recognition accuracy by applying post-processing to the encoded video.
We show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With advances in image recognition technology based on deep learning,
automatic video analysis by Artificial Intelligence is becoming more
widespread. As the amount of video used for image recognition increases,
efficient compression methods for such video data are necessary. In general,
when the image quality deteriorates due to image encoding, the image
recognition accuracy also falls. Therefore, in this paper, we propose a
neural-network-based approach to improve image recognition accuracy, especially
the object detection accuracy by applying post-processing to the encoded video.
Versatile Video Coding (VVC) will be used for the video compression method,
since it is the latest video coding method with the best encoding performance.
The neural network is trained using the features of YOLO-v7, the latest object
detection model. By using VVC as the video coding method and YOLO-v7 as the
detection model, high object detection accuracy is achieved even at low bit
rates. Experimental results show that the combination of the proposed method
and VVC achieves better coding performance than regular VVC in object detection
accuracy.
Related papers
- NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z) - A Deep Learning Approach to Video Anomaly Detection using Convolutional
Autoencoders [0.0]
Our method utilizes a convolutional autoencoder to learn the patterns of normal videos and then compares each frame of a test video to this learned representation.
We evaluated our approach and achieved an overall accuracy of 99.35% on the Ped1 dataset and 97% on the Ped2 dataset.
The results show that our method outperforms other state-of-the-art methods and it can be used in real-world applications for video anomaly detection.
arXiv Detail & Related papers (2023-11-07T21:23:32Z) - Deepfake Video Detection Using Generative Convolutional Vision
Transformer [3.8297637120486496]
We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection.
Our model combines ConvNeXt and Swin Transformer models for feature extraction.
By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
arXiv Detail & Related papers (2023-07-13T19:27:40Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - VVC Extension Scheme for Object Detection Using Contrast Reduction [0.0]
We propose an extention scheme of video coding for object detection using Versatile Video Coding (VVC)
In our proposed scheme, the original image is reduced in size and contrast, then coded with VVC encoder to achieve high compression performance.
Experimental results show that the proposed video coding scheme achieves better coding performance than regular VVC in terms of object detection accuracy.
arXiv Detail & Related papers (2023-05-30T06:29:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Rethinking Resolution in the Context of Efficient Video Recognition [49.957690643214576]
Cross-resolution KD (ResKD) is a simple but effective method to boost recognition accuracy on low-resolution frames.
We extensively demonstrate its effectiveness over state-of-the-art architectures, i.e., 3D-CNNs and Video Transformers.
arXiv Detail & Related papers (2022-09-26T15:50:44Z) - Saliency-Driven Versatile Video Coding for Neural Object Detection [7.367608892486084]
We propose a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC)
To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once(YOLO) in combination with a novel decision criterion.
We find that, compared to the reference VVC with a constant quality, up to 29 % of accuracy can be saved with the same detection at the decoder side by applying the proposed saliency-driven framework.
arXiv Detail & Related papers (2022-03-11T14:27:43Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.