Learn to Compress (LtC): Efficient Learning-based Streaming Video
Analytics
- URL: http://arxiv.org/abs/2307.12171v2
- Date: Tue, 25 Jul 2023 22:18:33 GMT
- Title: Learn to Compress (LtC): Efficient Learning-based Streaming Video
Analytics
- Authors: Quazi Mishkatul Alam, Israat Haque, Nael Abu-Ghazaleh
- Abstract summary: LtC is a collaborative framework between the video source and the analytics server that efficiently learns to reduce the video streams within an analytics pipeline.
LtC is able to use 28-35% less bandwidth and has up to 45% shorter response delay compared to recently published state of the art streaming frameworks.
- Score: 3.2872586139884623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video analytics are often performed as cloud services in edge settings,
mainly to offload computation, and also in situations where the results are not
directly consumed at the video sensors. Sending high-quality video data from
the edge devices can be expensive both in terms of bandwidth and power use. In
order to build a streaming video analytics pipeline that makes efficient use of
these resources, it is therefore imperative to reduce the size of the video
stream. Traditional video compression algorithms are unaware of the semantics
of the video, and can be both inefficient and harmful for the analytics
performance. In this paper, we introduce LtC, a collaborative framework between
the video source and the analytics server, that efficiently learns to reduce
the video streams within an analytics pipeline. Specifically, LtC uses the
full-fledged analytics algorithm at the server as a teacher to train a
lightweight student neural network, which is then deployed at the video source.
The student network is trained to comprehend the semantic significance of
various regions within the videos, which is used to differentially preserve the
crucial regions in high quality while the remaining regions undergo aggressive
compression. Furthermore, LtC also incorporates a novel temporal filtering
algorithm based on feature-differencing to omit transmitting frames that do not
contribute new information. Overall, LtC is able to use 28-35% less bandwidth
and has up to 45% shorter response delay compared to recently published state
of the art streaming frameworks while achieving similar analytics performance.
Related papers
- STAC: Leveraging Spatio-Temporal Data Associations For Efficient
Cross-Camera Streaming and Analytics [0.0]
We propose an efficient cross-cameras surveillance system that provides real-time analytics and inference under constrained network environments.
We integrate STAC with frame filtering and state-of-the-art compression for streaming characteristics.
We evaluate the performance of STA using this dataset to measure the accuracy metrics and inference rate for completenessid.
arXiv Detail & Related papers (2024-01-27T04:02:52Z) - Building an Open-Vocabulary Video CLIP Model with Better Architectures,
Optimization and Data [102.0069667710562]
This paper presents Open-VCLIP++, a framework that adapts CLIP to a strong zero-shot video classifier.
We demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data.
Our approach is evaluated on three widely used action recognition datasets.
arXiv Detail & Related papers (2023-10-08T04:46:43Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics [26.012783785622073]
Low-quality video is collected by existing surveillance systems because of poor quality cameras or over-compressed/pruned video streaming protocols.
We present AccDecoder, a novel accelerated decoder for real-time and neural network-based video analytics.
arXiv Detail & Related papers (2023-01-20T16:30:44Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Turbo: Opportunistic Enhancement for Edge Video Analytics [15.528497833853146]
We study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources.
We propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism.
Our system boosts object detection accuracy by $7.3-11.3%$ without incurring any latency costs.
arXiv Detail & Related papers (2022-06-29T12:13:30Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z) - Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding.
Existing TAL methods rely on pre-training a video encoder through action classification supervision.
We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.