Related papers: Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud

Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud

URL: http://arxiv.org/abs/2106.05345v1
Date: Wed, 9 Jun 2021 19:23:58 GMT
Title: Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud
Authors: Jashwant Raj Gunasekaran, Cyan Subhra Mishra, Prashanth Thinakaran, Mahmut Taylan Kandemir, Chita R. Das
Abstract summary: We proposeCocktail, a costeffective ensembling-based model serving framework. A prototype implementation ofCocktailon the AWS EC2 platform and exhaustive evalua-tions using a variety of workloads demonstrate thatCocktailcan reduce deployment cost by 1.45x.
Score: 9.149566952446058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With a growing demand for adopting ML models for a varietyof application services, it is vital that the frameworks servingthese models are capable of delivering highly accurate predic-tions with minimal latency along with reduced deploymentcosts in a public cloud environment. Despite high latency,prior works in this domain are crucially limited by the accu-racy offered by individual models. Intuitively, model ensem-bling can address the accuracy gap by intelligently combiningdifferent models in parallel. However, selecting the appro-priate models dynamically at runtime to meet the desiredaccuracy with low latency at minimal deployment cost is anontrivial problem. Towards this, we proposeCocktail, a costeffective ensembling-based model serving framework.Cock-tailcomprises of two key components: (i) a dynamic modelselection framework, which reduces the number of modelsin the ensemble, while satisfying the accuracy and latencyrequirements; (ii) an adaptive resource management (RM)framework that employs a distributed proactive autoscalingpolicy combined with importance sampling, to efficiently allo-cate resources for the models. The RM framework leveragestransient virtual machine (VM) instances to reduce the de-ployment cost in a public cloud. A prototype implementationofCocktailon the AWS EC2 platform and exhaustive evalua-tions using a variety of workloads demonstrate thatCocktailcan reduce deployment cost by 1.45x, while providing 2xreduction in latency and satisfying the target accuracy for upto 96% of the requests, when compared to state-of-the-artmodel-serving frameworks.

Related papers

Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization [74.92515821144484]
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. This paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models.
arXiv Detail & Related papers (2025-04-25T04:07:21Z)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous Environment Adaptation [47.35179593006409]
We propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation for dynamic edge environments. We show that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g. 7.12x communication cost reduction) in adapting models to dynamic edge environments.
arXiv Detail & Related papers (2023-11-18T14:10:09Z)
On Optimal Caching and Model Multiplexing for Large Model Inference [66.50550915522551]
Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. We study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing.
arXiv Detail & Related papers (2023-06-03T05:01:51Z)
Scavenger: A Cloud Service for Optimizing Cost and Performance of ML Training [1.047192732651018]
We develop principled and practical techniques for optimizing the training time and cost of distributed ML model training on the cloud. By combining conventional parallel scaling concepts and new insights into SGD noise, our models accurately estimate the time and cost on different cluster configurations with 5% error.
arXiv Detail & Related papers (2023-03-12T13:42:39Z)
Complement Sparsification: Low-Overhead Model Pruning for Federated Learning [2.0428960719376166]
Federated Learning (FL) is a privacy-preserving distributed deep learning paradigm that involves substantial communication and computation effort. Existing model pruning/sparsification solutions cannot satisfy the requirements for low bidirectional communication overhead between the server and the clients. We propose Complement Sparsification (CS), a pruning mechanism that satisfies all these requirements through a complementary and collaborative pruning done at the server and the clients.
arXiv Detail & Related papers (2023-03-10T23:07:02Z)
MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training. Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z)
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs. Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks. We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z)
Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data. In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.