Related papers: ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

URL: http://arxiv.org/abs/2010.10754v1
Date: Wed, 21 Oct 2020 04:11:05 GMT
Title: ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles
Authors: Ran Xu, Chen-lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi
Abstract summary: We introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines.
Score: 19.41234144545467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities and autonomous transportation. With an ever-growing number of powerful client devices, there is incentive to move these heavy video analytics workloads from the cloud to mobile devices to achieve low latency and real-time processing and to preserve user privacy. However, most video analytic systems are heavyweight and are trained offline with some pre-defined latency or accuracy requirements. This makes them unable to adapt at runtime in the face of three types of dynamism -- the input video characteristics change, the amount of compute resources available on the node changes due to co-located applications, and the user's latency-accuracy requirements change. In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios. To achieve this, we introduce a multi-branch object detection kernel (layered on Faster R-CNN), which incorporates a data-driven modeling approach on the performance metrics, and a latency SLA-driven scheduler to pick the best execution branch at runtime. We couple this kernel with approximable video object tracking algorithms to create an end-to-end video object detection system. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines, e.g., it achieves 52% lower latency and 11.1% higher accuracy over YOLOv3.

Related papers

Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics [18.042752812489276]
We introduce an end-to-end edge-assisted video inference acceleration system based on Vision Transformer (ViT) Our findings reveal that Arena can boost inference speeds by up to 1.58(times) and 1.82(times) on average while consuming only 47% and 31% of the bandwidth, respectively, all with high inference accuracy.
arXiv Detail & Related papers (2024-04-14T13:14:13Z)
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [60.09774333024783]
We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap between the anchor and target queries. We also introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks.
arXiv Detail & Related papers (2024-03-29T17:58:50Z)
Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering [18.55091203660391]
This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM) TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model.
arXiv Detail & Related papers (2024-02-29T07:42:03Z)
Fast and Resource-Efficient Object Tracking on Edge Devices: A Measurement Study [9.976630547252427]
Multi-object tracking (MOT) detects the moving objects and tracks their locations frame by frame as real scenes are being captured into a video. This paper examines the performance issues and edge-specific optimization opportunities for object tracking. We present several edge specific performance optimization strategies, collectively coined as EMO, to speed up the real time object tracking.
arXiv Detail & Related papers (2023-09-06T02:25:36Z)
Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics. Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing. We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z)
Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS) We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST) Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z)
Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals [85.76513755331318]
Argus++ is a robust real-time activity detection system for analyzing unconstrained video streams. The overall system is optimized for real-time processing on standalone consumer-level hardware.
arXiv Detail & Related papers (2022-01-14T03:35:22Z)
Parallel Detection for Efficient Video Analytics at the Edge [5.547133811014004]
Deep Neural Network (DNN) trained object detectors are widely deployed in mission-critical systems for real time video analytics at the edge. A common performance requirement in mission-critical edge services is the near real-time latency of online object detection on edge devices. This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems.
arXiv Detail & Related papers (2021-07-27T02:50:46Z)
DS-Net: Dynamic Spatiotemporal Network for Video Salient Object Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information. We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS) We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.