Serving and Optimizing Machine Learning Workflows on Heterogeneous
Infrastructures
- URL: http://arxiv.org/abs/2205.04713v1
- Date: Tue, 10 May 2022 07:32:32 GMT
- Title: Serving and Optimizing Machine Learning Workflows on Heterogeneous
Infrastructures
- Authors: Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu
- Abstract summary: JellyBean is a framework for serving and optimizing machine learning inference on heterogeneous infrastructures.
We show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%.
- Score: 9.178035808110124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of ubiquitous deployment of smart devices and the Internet of
Things, data sources for machine learning inference have increasingly moved to
the edge of the network. Existing machine learning inference platforms
typically assume a homogeneous infrastructure and do not take into account the
more complex and tiered computing infrastructure that includes edge devices,
local hubs, edge datacenters, and cloud datacenters. On the other hand, recent
machine learning efforts have provided viable solutions for model compression,
pruning and quantization for heterogeneous environments; for a machine learning
model, now we may easily find or even generate a series of models with
different tradeoffs between accuracy and efficiency.
We design and implement JellyBean, a framework for serving and optimizing
machine learning inference workflows on heterogeneous infrastructures. Given
service-level objectives (e.g., throughput, accuracy), JellyBean automatically
selects the most cost-efficient models that met the accuracy target and decides
how to deploy them across different tiers of infrastructures. Evaluations show
that JellyBean reduces the total serving cost of visual question answering by
up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%
compared with state-of-the-art model selection and worker assignment solutions.
JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud)
up to 5x in serving costs.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Efficient Data Distribution Estimation for Accelerated Federated Learning [5.085889377571319]
Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices.
Devices are highly heterogeneous in both their system resources and training data.
Various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy.
arXiv Detail & Related papers (2024-06-03T20:33:17Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - Online Data Selection for Federated Learning with Limited Storage [53.46789303416799]
Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices.
The impact of on-device storage on the performance of FL is still not explored.
In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
arXiv Detail & Related papers (2022-09-01T03:27:33Z) - RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
a Diverse Pool of Cloud Computing Instances [7.539635201319158]
RIBBON is a novel deep learning inference serving system.
It meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness.
arXiv Detail & Related papers (2022-07-23T06:45:14Z) - Applied Federated Learning: Architectural Design for Robust and
Efficient Learning in Privacy Aware Settings [0.8454446648908585]
The classical machine learning paradigm requires the aggregation of user data in a central location.
Centralization of data poses risks, including a heightened risk of internal and external security incidents.
Federated learning with differential privacy is designed to avoid the server-side centralization pitfall.
arXiv Detail & Related papers (2022-06-02T00:30:04Z) - Multi-Edge Server-Assisted Dynamic Federated Learning with an Optimized
Floating Aggregation Point [51.47520726446029]
cooperative edge learning (CE-FL) is a distributed machine learning architecture.
We model the processes taken during CE-FL, and conduct analytical training.
We show the effectiveness of our framework with the data collected from a real-world testbed.
arXiv Detail & Related papers (2022-03-26T00:41:57Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - ESAI: Efficient Split Artificial Intelligence via Early Exiting Using
Neural Architecture Search [6.316693022958222]
Deep neural networks have been outperforming conventional machine learning algorithms in many computer vision-related tasks.
The majority of devices are harnessing the cloud computing methodology in which outstanding deep learning models are responsible for analyzing the data on the server.
In this paper, a new framework for deploying on IoT devices has been proposed which can take advantage of both the cloud and the on-device models.
arXiv Detail & Related papers (2021-06-21T04:47:53Z) - Cost-effective Machine Learning Inference Offload for Edge Computing [0.3149883354098941]
This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources.
The proposed mechanism allows the edge devices to offload heavy and compute-intensive workloads to edge nodes instead of using remote cloud.
arXiv Detail & Related papers (2020-12-07T21:11:02Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.