A Survey of Serverless Machine Learning Model Inference
- URL: http://arxiv.org/abs/2311.13587v1
- Date: Wed, 22 Nov 2023 18:46:05 GMT
- Title: A Survey of Serverless Machine Learning Model Inference
- Authors: Kamil Kojs
- Abstract summary: Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products.
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in Generative AI, Computer Vision, and Natural Language
Processing have led to an increased integration of AI models into various
products. This widespread adoption of AI requires significant efforts in
deploying these models in production environments. When hosting machine
learning models for real-time predictions, it is important to meet defined
Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and
optimizing operational costs of the underlying infrastructure. Large machine
learning models often demand GPU resources for efficient inference to meet
SLOs. In the context of these trends, there is growing interest in hosting AI
models in a serverless architecture while still providing GPU access for
inference tasks. This survey aims to summarize and categorize the emerging
challenges and optimization opportunities for large-scale deep learning serving
systems. By providing a novel taxonomy and summarizing recent trends, we hope
that this survey could shed light on new optimization perspectives and motivate
novel works in large-scale deep learning serving systems.
Related papers
- LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Towards Efficient Generative Large Language Model Serving: A Survey from
Algorithms to Systems [14.355768064425598]
generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.
However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency.
This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective.
arXiv Detail & Related papers (2023-12-23T11:57:53Z) - Reinforcement Learning for Generative AI: A Survey [40.21640713844257]
This survey aims to shed light on a high-level review that spans a range of application areas.
We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications.
We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI.
arXiv Detail & Related papers (2023-08-28T06:15:14Z) - Entity Aware Modelling: A Survey [22.32009539611539]
Recent machine learning advances have led to new state-of-the-art response prediction models.
Models built at a population level often lead to sub-optimal performance in many personalized prediction settings.
In personalized prediction, the goal is to incorporate inherent characteristics of different entities to improve prediction performance.
arXiv Detail & Related papers (2023-02-16T16:33:33Z) - On-device Training: A First Overview on Existing Systems [8.0653715405809]
Efforts have been made to deploy some models on resource-constrained devices as well.
This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - A Survey of Large-Scale Deep Learning Serving System Optimization:
Challenges and Opportunities [24.38071862662089]
Survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc.
arXiv Detail & Related papers (2021-11-28T22:14:10Z) - INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.