Performance Modeling of Metric-Based Serverless Computing Platforms
- URL: http://arxiv.org/abs/2202.11247v1
- Date: Wed, 23 Feb 2022 00:39:01 GMT
- Title: Performance Modeling of Metric-Based Serverless Computing Platforms
- Authors: Nima Mahmoudi, Hamzeh Khazaei
- Abstract summary: The proposed performance model can help developers and providers predict the performance and cost of deployments with different configurations.
We validate the applicability and accuracy of the proposed performance model by extensive real-world experimentation on Knative.
- Score: 5.089110111757978
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Analytical performance models are very effective in ensuring the quality of
service and cost of service deployment remain desirable under different
conditions and workloads. While various analytical performance models have been
proposed for previous paradigms in cloud computing, serverless computing lacks
such models that can provide developers with performance guarantees. Besides,
most serverless computing platforms still require developers' input to specify
the configuration for their deployment that could affect both the performance
and cost of their deployment, without providing them with any direct and
immediate feedback. In previous studies, we built such performance models for
steady-state and transient analysis of scale-per-request serverless computing
platforms (e.g., AWS Lambda, Azure Functions, Google Cloud Functions) that
could give developers immediate feedback about the quality of service and cost
of their deployments. In this work, we aim to develop analytical performance
models for the latest trend in serverless computing platforms that use
concurrency value and the rate of requests per second for autoscaling
decisions. Examples of such serverless computing platforms are Knative and
Google Cloud Run (a managed Knative service by Google). The proposed
performance model can help developers and providers predict the performance and
cost of deployments with different configurations which could help them tune
the configuration toward the best outcome. We validate the applicability and
accuracy of the proposed performance model by extensive real-world
experimentation on Knative and show that our performance model is able to
accurately predict the steady-state characteristics of a given workload with
minimal amount of data collection.
Related papers
- Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 80 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Predicting Resource Consumption of Kubernetes Container Systems using
Resource Models [3.138731415322007]
This paper considers how to derive resource models for cloud systems empirically.
We do so based on models of deployed services in a formal language with explicit adherence to CPU and memory resources.
We report on leveraging data collected empirically from small deployments to simulate the execution of higher intensity scenarios on larger deployments.
arXiv Detail & Related papers (2023-05-12T17:59:01Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark.
Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions.
We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z) - Measuring the Driving Forces of Predictive Performance: Application to
Credit Scoring [0.0]
In credit scoring, machine learning models are known to outperform standard parametric models.
We introduce the XPER methodology to decompose a performance metric into contributions associated with a model.
We show that a small number of features can explain a surprisingly large part of the model performance.
arXiv Detail & Related papers (2022-12-12T13:09:46Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - Serving and Optimizing Machine Learning Workflows on Heterogeneous
Infrastructures [9.178035808110124]
JellyBean is a framework for serving and optimizing machine learning inference on heterogeneous infrastructures.
We show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%.
arXiv Detail & Related papers (2022-05-10T07:32:32Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Serverless Model Serving for Data Science [23.05534539170047]
We study the viability of serverless as a mainstream model serving platform for data science applications.
We find that serverless outperforms many cloud-based alternatives with respect to cost and performance.
We present several practical recommendations for data scientists on how to use serverless for scalable and cost-effective model serving.
arXiv Detail & Related papers (2021-03-04T11:23:01Z) - Benchmarking and Performance Modelling of MapReduce Communication
Pattern [0.0]
Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input.
Our approach is validated by running empirical experiments in two setups.
arXiv Detail & Related papers (2020-05-23T21:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.