Ekya: Continuous Learning of Video Analytics Models on Edge Compute
Servers
- URL: http://arxiv.org/abs/2012.10557v1
- Date: Sat, 19 Dec 2020 00:29:22 GMT
- Title: Ekya: Continuous Learning of Video Analytics Models on Edge Compute
Servers
- Authors: Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang,
Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica
- Abstract summary: Continuous learning handles data drift by periodically retraining the models on new data.
Our work addresses the challenge of jointly supporting inference and retraining tasks on edge servers.
Ekya balances this tradeoff across multiple models and uses a micro-profiler to identify the models that will benefit the most by retraining.
- Score: 20.533131101027067
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video analytics applications use edge compute servers for the analytics of
the videos (for bandwidth and privacy). Compressed models that are deployed on
the edge servers for inference suffer from data drift, where the live video
data diverges from the training data. Continuous learning handles data drift by
periodically retraining the models on new data. Our work addresses the
challenge of jointly supporting inference and retraining tasks on edge servers,
which requires navigating the fundamental tradeoff between the retrained
model's accuracy and the inference accuracy. Our solution Ekya balances this
tradeoff across multiple models and uses a micro-profiler to identify the
models that will benefit the most by retraining. Ekya's accuracy gain compared
to a baseline scheduler is 29% higher, and the baseline requires 4x more GPU
resources to achieve the same accuracy as Ekya.
Related papers
- Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Large Language Model (LLM) pretraining traditionally relies on autoregressive language modeling on randomly sampled data blocks from web-scale datasets.
We take inspiration from human learning techniques like spaced repetition to hypothesize that random data sampling for LLMs leads to high training cost and low quality models which tend to forget data.
In order to effectively commit web-scale information to long-term memory, we propose the LFR (Learn, Focus, and Review) pedagogy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift [7.165359653719119]
Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency.
The distribution of video content features may change over time, leading to accuracy degradation of existing models.
Recent work proposes a framework that uses a remote server to continually train and adapt the lightweight model at edge with the help of complex model.
arXiv Detail & Related papers (2024-06-05T07:06:26Z) - Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference [5.6679198251041765]
We introduce an online approximation algorithm, named ORRIC, designed to optimize resource allocation for adaptively balancing accuracy of training model and inference.
The competitive ratio of ORRIC outperforms that of the traditional In-ference-Only paradigm, especially when data persists for a sufficiently lengthy time.
arXiv Detail & Related papers (2024-05-25T03:05:19Z) - QCore: Data-Efficient, On-Device Continual Calibration for Quantized Models -- Extended Version [34.280197473547226]
Machine learning models can be deployed on edge devices with limited storage and computational capabilities.
We propose QCore to enable continual calibration on the edge.
arXiv Detail & Related papers (2024-04-22T08:57:46Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Lambda Learner: Fast Incremental Learning on Data Streams [5.543723668681475]
We propose a new framework for training models by incremental updates in response to mini-batches from data streams.
We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive.
We present a large-scale deployment on the sponsored content platform for a large social network.
arXiv Detail & Related papers (2020-10-11T04:00:34Z) - Characterizing and Modeling Distributed Training with Transient Cloud
GPU Servers [6.56704851092678]
We analyze distributed training performance under diverse cluster configurations using CM-DARE.
Our empirical datasets include measurements from three GPU types, six geographic regions, twenty convolutional neural networks, and thousands of Google Cloud servers.
We also demonstrate the feasibility of predicting training speed and overhead using regression-based models.
arXiv Detail & Related papers (2020-04-07T01:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.