Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices
- URL: http://arxiv.org/abs/2210.08578v2
- Date: Tue, 18 Oct 2022 00:26:43 GMT
- Title: Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices
- Authors: Yimeng Zhang, Akshay Karkal Kamath, Qiucheng Wu, Zhiwen Fan, Wuyang
Chen, Zhangyang Wang, Shiyu Chang, Sijia Liu, Cong Hao
- Abstract summary: We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream.
Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
- Score: 90.30316433184414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a data-model-hardware tri-design framework for
high-throughput, low-cost, and high-accuracy multi-object tracking (MOT) on
High-Definition (HD) video stream. First, to enable ultra-light video
intelligence, we propose temporal frame-filtering and spatial saliency-focusing
approaches to reduce the complexity of massive video data. Second, we exploit
structure-aware weight sparsity to design a hardware-friendly model compression
method. Third, assisted with data and model complexity reduction, we propose a
sparsity-aware, scalable, and low-power accelerator design, aiming to deliver
real-time performance with high energy efficiency. Different from existing
works, we make a solid step towards the synergized software/hardware
co-optimization for realistic MOT model implementation. Compared to the
state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x
latency reduction, 20.9x effective frame rate improvement, 5.83x lower power,
and 9.78x better energy efficiency, without much accuracy drop.
Related papers
- LADDER: An Efficient Framework for Video Frame Interpolation [12.039193291203492]
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc.
This paper introduces an efficient video frame framework that aims to strike a favorable balance between efficiency and quality.
arXiv Detail & Related papers (2024-04-17T06:47:17Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - Exploring Lightweight Hierarchical Vision Transformers for Efficient
Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices.
HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - Efficient Heterogeneous Video Segmentation at the Edge [2.4378845585726903]
We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute.
Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures.
We analyze and optimize the heterogeneous data flows in our systems across the CPU, the GPU and the NPU.
arXiv Detail & Related papers (2022-08-24T17:01:09Z) - EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
Prediction [67.11722682878722]
This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention.
Our multi-scale linear attention achieves the global receptive field and multi-scale learning.
EfficientViT delivers remarkable performance gains over previous state-of-the-art models.
arXiv Detail & Related papers (2022-05-29T20:07:23Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.