Semantics-Driven Cloud-Edge Collaborative Inference
- URL: http://arxiv.org/abs/2309.15435v1
- Date: Wed, 27 Sep 2023 06:53:09 GMT
- Title: Semantics-Driven Cloud-Edge Collaborative Inference
- Authors: Yuche Gao and Beibei Zhang
- Abstract summary: This paper proposes a semantics-driven cloud-edge collaborative approach for accelerating video inference.
The method separates semantics extraction and recognition, allowing edge servers to only extract visual semantics from video frames.
Experiments demonstrate significant improvements in end-to-end inference speed (up to 5x faster), throughput (up to 9 FPS), and reduced traffic volumes.
- Score: 1.441340412842035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the proliferation of video data in smart city applications like
intelligent transportation, efficient video analytics has become crucial but
also challenging. This paper proposes a semantics-driven cloud-edge
collaborative approach for accelerating video inference, using license plate
recognition as a case study. The method separates semantics extraction and
recognition, allowing edge servers to only extract visual semantics (license
plate patches) from video frames and offload computation-intensive recognition
to the cloud or neighboring edges based on load. This segmented processing
coupled with a load-aware work distribution strategy aims to reduce end-to-end
latency and improve throughput. Experiments demonstrate significant
improvements in end-to-end inference speed (up to 5x faster), throughput (up to
9 FPS), and reduced traffic volumes (50% less) compared to cloud-only or
edge-only processing, validating the efficiency of the proposed approach. The
cloud-edge collaborative framework with semantics-driven work partitioning
provides a promising solution for scaling video analytics in smart cities.
Related papers
- Learn to Compress (LtC): Efficient Learning-based Streaming Video
Analytics [3.2872586139884623]
LtC is a collaborative framework between the video source and the analytics server that efficiently learns to reduce the video streams within an analytics pipeline.
LtC is able to use 28-35% less bandwidth and has up to 45% shorter response delay compared to recently published state of the art streaming frameworks.
arXiv Detail & Related papers (2023-07-22T21:36:03Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics.
Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing.
We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames.
NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Adaptive Focus for Efficient Video Recognition [29.615394426035074]
We propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus)
A light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions.
During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices.
arXiv Detail & Related papers (2021-05-07T13:24:47Z) - VID-WIN: Fast Video Event Matching with Query-Aware Windowing at the
Edge for the Internet of Multimedia Things [3.222802562733787]
VID-WIN is an adaptive 2-stage allied windowing approach to accelerate video event analytics in an edge-cloud paradigm.
VID-WIN exploits the video content and input knobs to accelerate the video inference process across nodes.
arXiv Detail & Related papers (2021-04-27T10:08:40Z) - Multi-Task Network Pruning and Embedded Optimization for Real-time
Deployment in ADAS [0.0]
Camera-based Deep Learning algorithms are increasingly needed for perception in Automated Driving systems.
constraints from the automotive industry challenge the deployment of CNNs by imposing embedded systems with limited computational resources.
We propose an approach to embed a multi-task CNN network under such conditions on a commercial prototype platform.
arXiv Detail & Related papers (2021-01-19T19:29:38Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.