Mondrian: On-Device High-Performance Video Analytics with Compressive
Packed Inference
- URL: http://arxiv.org/abs/2403.07598v1
- Date: Tue, 12 Mar 2024 12:35:12 GMT
- Title: Mondrian: On-Device High-Performance Video Analytics with Compressive
Packed Inference
- Authors: Changmin Jeon, Seonjun Kim, Juheon Yi, Youngki Lee
- Abstract summary: Mondrian is an edge system that enables high-performance object detection on high-resolution video streams.
We devise a novel Compressive Packed Inference to minimize per-pixel processing costs.
- Score: 7.624476059109304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present Mondrian, an edge system that enables
high-performance object detection on high-resolution video streams. Many
lightweight models and system optimization techniques have been proposed for
resource-constrained devices, but they do not fully utilize the potential of
the accelerators over dynamic, high-resolution videos. To enable such
capability, we devise a novel Compressive Packed Inference to minimize
per-pixel processing costs by selectively determining the necessary pixels to
process and combining them to maximize processing parallelism. In particular,
our system quickly extracts ROIs and dynamically shrinks them, reflecting the
effect of the fast-changing characteristics of objects and scenes. It then
intelligently combines such scaled ROIs into large canvases to maximize the
utilization of inference accelerators such as GPU. Evaluation across various
datasets, models, and devices shows Mondrian outperforms state-of-the-art
baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by
15.0-19.7% higher accuracy, leading to $\times$6.65 higher throughput than
frame-wise inference for processing various 1080p video streams. We will
release the code after the paper review.
Related papers
- Fast and Memory-Efficient Video Diffusion Using Streamlined Inference [41.505829393818274]
Current video diffusion models exhibit demanding computational requirements and high peak memory usage.
We present Streamlined Inference, which leverages the temporal and spatial properties of video diffusion models.
Our approach significantly reduces peak memory and computational overhead, making it feasible to generate high-quality videos on a single consumer GPU.
arXiv Detail & Related papers (2024-11-02T07:52:18Z) - HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream.
Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z) - Efficient Heterogeneous Video Segmentation at the Edge [2.4378845585726903]
We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute.
Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures.
We analyze and optimize the heterogeneous data flows in our systems across the CPU, the GPU and the NPU.
arXiv Detail & Related papers (2022-08-24T17:01:09Z) - Turbo: Opportunistic Enhancement for Edge Video Analytics [15.528497833853146]
We study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources.
We propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism.
Our system boosts object detection accuracy by $7.3-11.3%$ without incurring any latency costs.
arXiv Detail & Related papers (2022-06-29T12:13:30Z) - STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution
Video Prediction [78.129039340528]
We propose a StemporalResidual Predictive Model (STRPM) for high-resolution video prediction.
STRPM can generate more satisfactory results compared with various existing methods.
Experimental results show that STRPM can generate more satisfactory results compared with various existing methods.
arXiv Detail & Related papers (2022-03-30T06:24:00Z) - A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task
Video Analytics Pipeline [16.72264118199915]
Video analytics pipelines are energy-intensive due to high data rates and reliance on complex inference algorithms.
We propose an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines.
Our framework has significantly surpassed all baseline methods of similar accuracy on the YouTube-VIS dataset.
arXiv Detail & Related papers (2021-04-09T15:44:06Z) - Large Motion Video Super-Resolution with Dual Subnet and Multi-Stage
Communicated Upsampling [18.09730129484432]
Video super-resolution (VSR) aims at restoring a video in low-resolution (LR) and improving it to higher-resolution (HR)
In this paper, we propose a novel deep neural network with Dual Subnet and Multi-stage Communicated Upsampling (DSMC) for super-resolution of videos with large motion.
arXiv Detail & Related papers (2021-03-22T11:52:12Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.