RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
a Diverse Pool of Cloud Computing Instances
- URL: http://arxiv.org/abs/2207.11434v1
- Date: Sat, 23 Jul 2022 06:45:14 GMT
- Title: RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
a Diverse Pool of Cloud Computing Instances
- Authors: Baolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen
Gettings, Devesh Tiwari
- Abstract summary: RIBBON is a novel deep learning inference serving system.
It meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness.
- Score: 7.539635201319158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning model inference is a key service in many businesses and
scientific discovery processes. This paper introduces RIBBON, a novel deep
learning inference serving system that meets two competing objectives:
quality-of-service (QoS) target and cost-effectiveness. The key idea behind
RIBBON is to intelligently employ a diverse set of cloud computing instances
(heterogeneous instances) to meet the QoS target and maximize cost savings.
RIBBON devises a Bayesian Optimization-driven strategy that helps users build
the optimal set of heterogeneous instances for their model inference service
needs on cloud computing platforms -- and, RIBBON demonstrates its superiority
over existing approaches of inference serving systems using homogeneous
instance pools. RIBBON saves up to 16% of the inference service cost for
different learning models including emerging deep learning recommender system
models and drug-discovery enabling models.
Related papers
- Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Federated Learning While Providing Model as a Service: Joint Training
and Inference Optimization [30.305956110710266]
Federated learning is beneficial for enabling the training of models across distributed clients.
Existing work has overlooked the coexistence of model training and inference under clients' limited resources.
This paper focuses on the joint optimization of model training and inference to maximize inference performance at clients.
arXiv Detail & Related papers (2023-12-20T09:27:09Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - KAIROS: Building Cost-Efficient Machine Learning Inference Systems with
Heterogeneous Cloud Resources [10.462798429064277]
KAIROS is a novel runtime framework that maximizes the query throughput while meeting target and a cost budget.
Our evaluation using industry-grade deep learning (DL) models shows that KAIROS yields up to 2X the throughput of an optimal homogeneous solution.
arXiv Detail & Related papers (2022-10-12T03:06:51Z) - Serving and Optimizing Machine Learning Workflows on Heterogeneous
Infrastructures [9.178035808110124]
JellyBean is a framework for serving and optimizing machine learning inference on heterogeneous infrastructures.
We show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%.
arXiv Detail & Related papers (2022-05-10T07:32:32Z) - Energy-Aware Edge Association for Cluster-based Personalized Federated
Learning [2.3262774900834606]
Federated Learning over wireless network enables data-conscious services by leveraging ubiquitous intelligence at network edge for privacy-preserving model training.
We propose clustered federated learning to group user devices with similar preference and provide each cluster with a personalized model.
We formulate an accuracy-cost trade-off optimization problem by jointly considering model accuracy, communication resource allocation and energy consumption.
arXiv Detail & Related papers (2022-02-06T07:58:41Z) - Cost-Effective Federated Learning in Mobile Edge Networks [37.16466118235272]
Federated learning (FL) is a distributed learning paradigm that enables a large number of mobile devices to collaboratively learn a model without sharing their raw data.
We analyze how to design adaptive FL in mobile edge networks that optimally chooses essential control variables to minimize the total cost.
We develop a low-cost sampling-based algorithm to learn the convergence related unknown parameters.
arXiv Detail & Related papers (2021-09-12T03:02:24Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.