Related papers: Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection

Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection

URL: http://arxiv.org/abs/2305.08265v3
Date: Sat, 5 Aug 2023 12:53:09 GMT
Title: Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection
Authors: Muhammet Sebul Berato\u{g}lu and Beh\c{c}et U\u{g}ur T\"oreyin
Abstract summary: This study introduces a novel random perturbation-based compressed domain method for reconstructing images from HEVC bitstreams. We demonstrate a significant increase in the reconstruction speed compared to the traditional full decoding approach. We achieve a detection accuracy of 99.9%, on par with the pixel domain method, and a classification accuracy of 96.84%, only 0.98% lower than the pixel domain method.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the field of video analytics, particularly traffic surveillance, there is a growing need for efficient and effective methods for processing and understanding video data. Traditional full video decoding techniques can be computationally intensive and time-consuming, leading researchers to explore alternative approaches in the compressed domain. This study introduces a novel random perturbation-based compressed domain method for reconstructing images from High Efficiency Video Coding (HEVC) bitstreams, specifically designed for traffic surveillance applications. To the best of our knowledge, our method is the first to propose substituting random perturbations for residual values, creating a condensed representation of the original image while retaining information relevant to video understanding tasks, particularly focusing on vehicle detection and classification as key use cases. By not using residual data, our proposed method significantly reduces the data needed in the image reconstruction process, allowing for more efficient storage and transmission of information. This is particularly important when considering the vast amount of video data involved in surveillance applications. Applied to the public BIT-Vehicle dataset, we demonstrate a significant increase in the reconstruction speed compared to the traditional full decoding approach, with our proposed method being approximately 56% faster than the pixel domain method. Additionally, we achieve a detection accuracy of 99.9%, on par with the pixel domain method, and a classification accuracy of 96.84%, only 0.98% lower than the pixel domain method. Furthermore, we showcase the significant reduction in data size, leading to more efficient storage and transmission. Our research establishes the potential of compressed domain methods in traffic surveillance applications, where speed and data size are critical factors.

Related papers

Enhancing Classification of Streaming Data with Image Distillation [1.2891210250935148]
This study tackles the challenge of efficiently classifying streaming data in envi-ronments with limited memory and computational resources.<n>It delves into the application of data distillation as an innovative approach to improve the precision of streaming image data classification.
arXiv Detail & Related papers (2025-09-08T15:24:35Z)
Optimizing Region of Interest Selection for Effective Embedding in Video Steganography Based on Genetic Algorithms [1.6114012813668932]
This paper proposes a new method to video steganography, which involves utilizing a Genetic Algorithm (GA) for identifying the Region of Interest (ROI) in the cover video.<n>The secret data is encrypted using the Advanced Encryption Standard (AES), which is a widely accepted encryption standard, before being embedded into the cover video.<n>The proposed method has a high embedding capacity and efficiency, with a PSNR ranging between 64 and 75 dBs, which indicates that the embedded data is almost indistinguishable from the original video.
arXiv Detail & Related papers (2025-08-19T10:16:45Z)
Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems. Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner. We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space. We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z)
Secure Information Embedding in Images with Hybrid Firefly Algorithm [2.9182357325967145]
This research introduces a novel steganographic approach for concealing a confidential portable document format (PDF) document within a host image. The purpose of this search is to accomplish two main goals: increasing the host image's capacity and reducing distortion. The findings indicate a decrease in image distortion and an accelerated rate of convergence in the search process.
arXiv Detail & Related papers (2023-12-21T01:50:02Z)
A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data [3.04585143845864]
One approach to efficiently store the healthcare data is to extract the relevant and representative features and store only those features instead of the continuous streaming data. We present a preliminary study, where we explored multiple autoencoders for concise feature extraction and reconstruction for human activity recognition (HAR) sensor data. Our Multi-Layer Perceptron (MLP) deep autoencoder achieved a storage reduction of 90.18% compared to the three other implemented autoencoders.
arXiv Detail & Related papers (2023-02-25T03:33:26Z)
Rethinking Resolution in the Context of Efficient Video Recognition [49.957690643214576]
Cross-resolution KD (ResKD) is a simple but effective method to boost recognition accuracy on low-resolution frames. We extensively demonstrate its effectiveness over state-of-the-art architectures, i.e., 3D-CNNs and Video Transformers.
arXiv Detail & Related papers (2022-09-26T15:50:44Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
FasterVideo: Efficient Online Joint Object Detection And Tracking [0.8680676599607126]
We re-think one of the most successful methods for image object detection, Faster R-CNN, and extend it to the video domain. Our proposed method reaches a very high computational efficiency necessary for relevant applications.
arXiv Detail & Related papers (2022-04-15T09:25:34Z)
FOVEA: Foveated Image Magnification for Autonomous Navigation [53.69803081925454]
We propose an attentional approach that elastically magnifies certain regions while maintaining a small input canvas. Our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning. On the autonomous driving datasets Argoverse-HD and BDD100K, we show our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning.
arXiv Detail & Related papers (2021-08-27T03:07:55Z)
Superpixels and Graph Convolutional Neural Networks for Efficient Detection of Nutrient Deficiency Stress from Aerial Imagery [3.6843744304889183]
We seek to identify nutrient deficient areas from remotely sensed data to alert farmers to regions that require attention. We propose a much lighter graph-based method to perform node-based classification. This model has 4-orders-of-magnitude fewer parameters than a CNN model and trains in a matter of minutes.
arXiv Detail & Related papers (2021-04-20T21:18:16Z)
A Novel Upsampling and Context Convolution for Image Semantic Segmentation [0.966840768820136]
Recent methods for semantic segmentation often employ an encoder-decoder structure using deep convolutional neural networks. We propose a dense upsampling convolution method based on guided filtering to effectively preserve the spatial information of the image in the network. We report a new record of 82.86% and 81.62% of pixel accuracy on ADE20K and Pascal-Context benchmark datasets, respectively.
arXiv Detail & Related papers (2021-03-20T06:16:42Z)
CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade. CNNs are capable of learning robust representations of the data directly from the RGB pixels. Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.