Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification
- URL: http://arxiv.org/abs/2403.08580v1
- Date: Wed, 13 Mar 2024 14:35:13 GMT
- Title: Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification
- Authors: Yuxing Han, Yunan Ding, Chen Ye Gan, Jiangtao Wen
- Abstract summary: Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval.
Traditional methods require video decompression to extract pixel-level features like color, texture, and motion.
We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream decoding.
- Score: 12.322783570127756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifying videos into distinct categories, such as Sport and Music Video,
is crucial for multimedia understanding and retrieval, especially when an
immense volume of video content is being constantly generated. Traditional
methods require video decompression to extract pixel-level features like color,
texture, and motion, thereby increasing computational and storage demands.
Moreover, these methods often suffer from performance degradation in
low-quality videos. We present a novel approach that examines only the
post-compression bitstream of a video to perform classification, eliminating
the need for bitstream decoding. To validate our approach, we built a
comprehensive data set comprising over 29,000 YouTube video clips, totaling
6,000 hours and spanning 11 distinct categories. Our evaluations indicate
precision, accuracy, and recall rates consistently above 80%, many exceeding
90%, and some reaching 99%. The algorithm operates approximately 15,000 times
faster than real-time for 30fps videos, outperforming traditional Dynamic Time
Warping (DTW) algorithm by seven orders of magnitude.
Related papers
- Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning [71.94122309290537]
We propose an efficient, online approach to generate dense captions for videos.
Our model uses a novel autoregressive factorized decoding architecture.
Our approach shows excellent performance compared to both offline and online methods, and uses 20% less compute.
arXiv Detail & Related papers (2024-11-22T02:46:44Z) - Adaptive Caching for Faster Video Generation with Diffusion Transformers [52.73348147077075]
Diffusion Transformers (DiTs) rely on larger models and heavier attention mechanisms, resulting in slower inference speeds.
We introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache)
We also introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, controlling the compute allocation based on motion content.
arXiv Detail & Related papers (2024-11-04T18:59:44Z) - Blurry Video Compression: A Trade-off between Visual Enhancement and
Data Compression [65.8148169700705]
Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video.
Previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed.
In this work, we tackle the VC problem in a general scenario where a given video can be blurry due to predefined camera settings or dynamics in the scene.
arXiv Detail & Related papers (2023-11-08T02:17:54Z) - Judging a video by its bitstream cover [12.322783570127756]
Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval.
Traditional methods require video decompression to extract pixel-level features like color, texture, and motion.
We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream.
arXiv Detail & Related papers (2023-09-14T00:34:11Z) - LSCD: A Large-Scale Screen Content Dataset for Video Compression [5.857003653854907]
We propose the Large-scale Screen Content dataset, which contains 714 source sequences.
We provide the analysis of the proposed dataset to show some features of screen content videos.
We also provide a benchmark containing the performance of both traditional and learning-based methods.
arXiv Detail & Related papers (2023-08-18T06:27:35Z) - Compressed Vision for Efficient Video Understanding [83.97689018324732]
We propose a framework enabling research on hour-long videos with the same hardware that can now process second-long videos.
We replace standard video compression, e.g. JPEG, with neural compression and show that we can directly feed compressed videos as inputs to regular video networks.
arXiv Detail & Related papers (2022-10-06T15:35:49Z) - Speeding Up Action Recognition Using Dynamic Accumulation of Residuals
in Compressed Domain [2.062593640149623]
Temporal redundancy and the sheer size of raw videos are the two most common problematic issues related to video processing algorithms.
This paper presents an approach for using residual data, available in compressed videos directly, which can be obtained by a light partially decoding procedure.
Applying neural networks exclusively for accumulated residuals in the compressed domain accelerates performance, while the classification results are highly competitive with raw video approaches.
arXiv Detail & Related papers (2022-09-29T13:08:49Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.