Real-Time Video Inference on Edge Devices via Adaptive Model Streaming
- URL: http://arxiv.org/abs/2006.06628v2
- Date: Mon, 5 Apr 2021 23:29:53 GMT
- Title: Real-Time Video Inference on Edge Devices via Adaptive Model Streaming
- Authors: Mehrdad Khani, Pouya Hamadanian, Arash Nasr-Esfahany, Mohammad
Alizadeh
- Abstract summary: Real-time video inference on edge devices like mobile phones and drones is challenging due to the high cost of Deep Neural Networks.
We present Adaptive Model Streaming (AMS), a new approach to improving performance of efficient lightweight models for video inference on edge devices.
- Score: 9.101956442584251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time video inference on edge devices like mobile phones and drones is
challenging due to the high computation cost of Deep Neural Networks. We
present Adaptive Model Streaming (AMS), a new approach to improving performance
of efficient lightweight models for video inference on edge devices. AMS uses a
remote server to continually train and adapt a small model running on the edge
device, boosting its performance on the live video using online knowledge
distillation from a large, state-of-the-art model. We discuss the challenges of
over-the-network model adaptation for video inference, and present several
techniques to reduce communication cost of this approach: avoiding excessive
overfitting, updating a small fraction of important model parameters, and
adaptive sampling of training frames at edge devices. On the task of video
semantic segmentation, our experimental results show 0.4--17.8 percent mean
Intersection-over-Union improvement compared to a pre-trained model across
several video datasets. Our prototype can perform video segmentation at 30
frames-per-second with 40 milliseconds camera-to-label latency on a Samsung
Galaxy S10+ mobile phone, using less than 300 Kbps uplink and downlink
bandwidth on the device.
Related papers
- EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift [7.165359653719119]
Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency.
The distribution of video content features may change over time, leading to accuracy degradation of existing models.
Recent work proposes a framework that uses a remote server to continually train and adapt the lightweight model at edge with the help of complex model.
arXiv Detail & Related papers (2024-06-05T07:06:26Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Towards High-Quality and Efficient Video Super-Resolution via
Spatial-Temporal Data Overfitting [27.302681897961588]
Deep convolutional neural networks (DNNs) are widely used in various fields of computer vision.
We propose a novel method for high-quality and efficient video resolution upscaling tasks.
We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality.
arXiv Detail & Related papers (2023-03-15T02:40:02Z) - Video Mobile-Former: Video Recognition with Efficient Global
Spatial-temporal Modeling [125.95527079960725]
Transformer-based models have achieved top performance on major video recognition benchmarks.
Video Mobile-Former is the first Transformer-based video model which constrains the computational budget within 1G FLOPs.
arXiv Detail & Related papers (2022-08-25T17:59:00Z) - Long-Short Temporal Contrastive Learning of Video Transformers [62.71874976426988]
Self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results on par or better than those obtained with supervised pretraining on large-scale image datasets.
Our approach, named Long-Short Temporal Contrastive Learning, enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.
arXiv Detail & Related papers (2021-06-17T02:30:26Z) - MoViNets: Mobile Video Networks for Efficient Video Recognition [52.49314494202433]
3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets.
We propose a three-step approach to improve computational efficiency while substantially reducing the peak memory usage of 3D CNNs.
arXiv Detail & Related papers (2021-03-21T23:06:38Z) - ApproxDet: Content and Contention-Aware Approximate Object Detection for
Mobiles [19.41234144545467]
We introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements.
We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3.
We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines.
arXiv Detail & Related papers (2020-10-21T04:11:05Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Making DensePose fast and light [78.49552144907513]
Existing neural network models capable of solving this task are heavily parameterized.
To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection.
In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
arXiv Detail & Related papers (2020-06-26T19:42:20Z) - An On-Device Federated Learning Approach for Cooperative Model Update
between Edge Devices [2.99321624683618]
A neural-network based on-device learning approach is recently proposed, so that edge devices train incoming data at runtime to update their model.
In this paper, we focus on OS-ELM to sequentially train a model based on recent samples and combine it with autoencoder for anomaly detection.
We extend it for an on-device federated learning so that edge devices can exchange their trained results and update their model by using those collected from the other edge devices.
arXiv Detail & Related papers (2020-02-27T18:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.