Related papers: Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

URL: http://arxiv.org/abs/2402.18927v1
Date: Thu, 29 Feb 2024 07:42:03 GMT
Title: Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering
Authors: Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai
Abstract summary: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM) TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model.
Score: 18.55091203660391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model. ROIM determines each offloading frame's resolution and detection model configuration to ensure that the analysis results can return in time. TAODM and ROIM interact jointly to filter the repetitive spatial-temporal semantic information to maximize the processing rate while ensuring high video analysis accuracy. Unlike most existing works, this paper investigates the real-time video analysis systems where the intelligent visual device connects to the edge server through a wireless network with fluctuating network conditions. We decompose the real-time video analysis problem into the offloading decision and configurations selection sub-problems. To solve these two sub-problems, we introduce a double deep Q network (DDQN) based offloading approach and a contextual multi-armed bandit (CMAB) based adaptive configurations selection approach, respectively. A DDQN-CMAB reinforcement learning (DCRL) training framework is further developed to integrate these two approaches to improve the overall video analyzing performance. Extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts.

Related papers

VideoMolmo: Spatio-Temporal Grounding Meets Pointing [66.19964563104385]
VideoMolmo is a model tailored for fine-grained pointing of video sequences.<n>A novel temporal mask fusion employs SAM2 for bidirectional point propagation.<n>To evaluate the generalization of VideoMolmo, we introduce VPoMolS-temporal, a challenging out-of-distribution benchmark spanning five real-world scenarios.
arXiv Detail & Related papers (2025-06-05T17:59:29Z)
Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling [7.3949576464066]
We propose a deep learning framework designed to significantly optimize bandwidth for motion-transfer-enabled video applications. To capture complex motion effectively, we utilize the First Order Motion Model (FOMM), which encodes dynamic objects by detecting keypoints. We validate our results across three datasets for video animation and reconstruction using the following metrics: Mean Absolute Error, Joint Embedding Predictive Architecture Embedding Distance, Structural Similarity Index, and Average Pair-wise Displacement.
arXiv Detail & Related papers (2025-04-07T22:21:54Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object Detection [3.2805151494259563]
Real-time object detection on edge devices presents significant challenges due to their limited computational resources and the high demands of deep neural network (DNN)-based detection models. This paper introduces RE-POSE, a framework designed to optimize the accuracy-latency trade-off in resource-constrained edge environments.
arXiv Detail & Related papers (2025-01-16T10:56:45Z)
Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics. Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing. We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z)
Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation [25.41427065435164]
Unsupervised video object segmentation (UVOS) aims at automatically separating the primary foreground object(s) from the background in a video sequence. Existing UVOS methods either lack robustness when there are visually similar surroundings (appearance-based) or suffer from deterioration in the quality of their predictions because of dynamic background and inaccurate flow (flow-based) We propose an implicit motion-compensated network (IMCNet) combining complementary cues ($textiti.e.$, appearance and motion) with aligned motion information from the adjacent frames to the current frame at the feature level.
arXiv Detail & Related papers (2022-04-06T13:03:59Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS) Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage. We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)
Parallel Detection for Efficient Video Analytics at the Edge [5.547133811014004]
Deep Neural Network (DNN) trained object detectors are widely deployed in mission-critical systems for real time video analytics at the edge. A common performance requirement in mission-critical edge services is the near real-time latency of online object detection on edge devices. This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems.
arXiv Detail & Related papers (2021-07-27T02:50:46Z)
A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline [16.72264118199915]
Video analytics pipelines are energy-intensive due to high data rates and reliance on complex inference algorithms. We propose an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines. Our framework has significantly surpassed all baseline methods of similar accuracy on the YouTube-VIS dataset.
arXiv Detail & Related papers (2021-04-09T15:44:06Z)
ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles [19.41234144545467]
We introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines.
arXiv Detail & Related papers (2020-10-21T04:11:05Z)
A Deep-Unfolded Reference-Based RPCA Network For Video Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA) Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames. Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS) We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.