STAC: Leveraging Spatio-Temporal Data Associations For Efficient
Cross-Camera Streaming and Analytics
- URL: http://arxiv.org/abs/2401.15288v1
- Date: Sat, 27 Jan 2024 04:02:52 GMT
- Title: STAC: Leveraging Spatio-Temporal Data Associations For Efficient
Cross-Camera Streaming and Analytics
- Authors: Volodymyr Vakhniuk, Ayush Sarkar, Ragini Gupta
- Abstract summary: We propose an efficient cross-cameras surveillance system that provides real-time analytics and inference under constrained network environments.
We integrate STAC with frame filtering and state-of-the-art compression for streaming characteristics.
We evaluate the performance of STA using this dataset to measure the accuracy metrics and inference rate for completenessid.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an efficient cross-cameras surveillance system called,STAC, that
leverages spatio-temporal associations between multiple cameras to provide
real-time analytics and inference under constrained network environments. STAC
is built using the proposed omni-scale feature learning people reidentification
(reid) algorithm that allows accurate detection, tracking and re-identification
of people across cameras using the spatio-temporal characteristics of video
frames. We integrate STAC with frame filtering and state-of-the-art compression
for streaming technique (that is, ffmpeg libx264 codec) to remove redundant
information from cross-camera frames. This helps in optimizing the cost of
video transmission as well as compute/processing, while maintaining high
accuracy for real-time query inference. The introduction of AICity Challenge
2023 Data [1] by NVIDIA has allowed exploration of systems utilizing
multi-camera people tracking algorithms. We evaluate the performance of STAC
using this dataset to measure the accuracy metrics and inference rate for reid.
Additionally, we quantify the reduction in video streams achieved through frame
filtering and compression using FFmpeg compared to the raw camera streams. For
completeness, we make available our repository to reproduce the results,
available at https://github.com/VolodymyrVakhniuk/CS444_Final_Project.
Related papers
- A Secure and Private Distributed Bayesian Federated Learning Design [56.92336577799572]
Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server.<n>DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy.<n>We propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration.
arXiv Detail & Related papers (2026-02-23T16:12:02Z) - Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? [57.000348519630286]
Recent advances in mobile edge computing have made it possible to offload-intensive object detection to edge servers equipped with high-accuracy neural networks.<n>This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking.<n>We propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection.
arXiv Detail & Related papers (2025-11-25T04:54:51Z) - TRACER: Efficient Object Re-Identification in Networked Cameras through Adaptive Query Processing [8.955401552705892]
Spatula is the state-of-the-art video database management system (VDBMS) for processing Re-ID queries.<n>It is not suitable for critical video analytics applications that require high recall due to camera history.<n>We present Tracer, a novel VDBMS for efficiently processing Re-ID queries using an adaptive query processing framework.
arXiv Detail & Related papers (2025-07-13T02:22:08Z) - Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities.<n>It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Accelerated Event-Based Feature Detection and Compression for
Surveillance Video Systems [1.5390526524075634]
We propose a novel system which conveys temporal redundancy within a sparse decompressed representation.
We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples.
Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
arXiv Detail & Related papers (2023-12-13T15:30:29Z) - Learn to Compress (LtC): Efficient Learning-based Streaming Video
Analytics [3.2872586139884623]
LtC is a collaborative framework between the video source and the analytics server that efficiently learns to reduce the video streams within an analytics pipeline.
LtC is able to use 28-35% less bandwidth and has up to 45% shorter response delay compared to recently published state of the art streaming frameworks.
arXiv Detail & Related papers (2023-07-22T21:36:03Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - GPU-accelerated SIFT-aided source identification of stabilized videos [63.084540168532065]
We exploit the parallelization capabilities of Graphics Processing Units (GPUs) in the framework of stabilised frames inversion.
We propose to exploit SIFT features.
to estimate the camera momentum and %to identify less stabilized temporal segments.
Experiments confirm the effectiveness of the proposed approach in reducing the required computational time and improving the source identification accuracy.
arXiv Detail & Related papers (2022-07-29T07:01:31Z) - FrameHopper: Selective Processing of Video Frames in Detection-driven
Real-Time Video Analytics [2.5119455331413376]
Detection-driven real-time video analytics require continuous detection of objects contained in the video frames.
Running these detectors on each and every frame in resource-constrained edge devices is computationally intensive.
We propose an off-line Reinforcement Learning (RL)-based algorithm to determine these skip-lengths.
arXiv Detail & Related papers (2022-03-22T07:05:57Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - CANS: Communication Limited Camera Network Self-Configuration for
Intelligent Industrial Surveillance [8.360870648463653]
Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data.
Multiple video streams compete for limited communication resources on the link between edge devices and camera networks.
An adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service.
arXiv Detail & Related papers (2021-09-13T01:54:33Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation
in Video Live Streaming [61.145467627057194]
We develop a new method called Face Pixelation in Video Live Streaming to generate automatic personal privacy filtering.
For fast and accurate pixelation of irrelevant people's faces, FPVLS is organized in a frame-to-video structure of two core stages.
On the video live streaming dataset we collected, FPVLS obtains satisfying accuracy, real-time efficiency, and contains the over-pixelation problems.
arXiv Detail & Related papers (2021-01-04T16:18:26Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z) - Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos.
For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z) - CONVINCE: Collaborative Cross-Camera Video Analytics at the Edge [1.5469452301122173]
This paper introduces CONVINCE, a new approach to look at cameras as a collective entity that enables collaborative video analytics pipeline among cameras.
Our results demonstrate that CONVINCE achieves an object identification accuracy of $sim$91%, by transmitting only about $sim$25% of all the recorded frames.
arXiv Detail & Related papers (2020-02-05T23:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.