A Survey of Serverless Machine Learning Model Inference
- URL: http://arxiv.org/abs/2311.13587v1
- Date: Wed, 22 Nov 2023 18:46:05 GMT
- Title: A Survey of Serverless Machine Learning Model Inference
- Authors: Kamil Kojs
- Abstract summary: Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products.
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in Generative AI, Computer Vision, and Natural Language
Processing have led to an increased integration of AI models into various
products. This widespread adoption of AI requires significant efforts in
deploying these models in production environments. When hosting machine
learning models for real-time predictions, it is important to meet defined
Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and
optimizing operational costs of the underlying infrastructure. Large machine
learning models often demand GPU resources for efficient inference to meet
SLOs. In the context of these trends, there is growing interest in hosting AI
models in a serverless architecture while still providing GPU access for
inference tasks. This survey aims to summarize and categorize the emerging
challenges and optimization opportunities for large-scale deep learning serving
systems. By providing a novel taxonomy and summarizing recent trends, we hope
that this survey could shed light on new optimization perspectives and motivate
novel works in large-scale deep learning serving systems.
Related papers
- A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models [16.250856588632637]
The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence.
These models are increasingly integrated into diverse applications, impacting both research and industry.
This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models.
arXiv Detail & Related papers (2024-10-08T21:46:52Z) - AI Foundation Models in Remote Sensing: A Survey [6.036426846159163]
This paper provides a comprehensive survey of foundation models in the remote sensing domain.
We categorize these models based on their applications in computer vision and domain-specific tasks.
We highlight emerging trends and the significant advancements achieved by these foundation models.
arXiv Detail & Related papers (2024-08-06T22:39:34Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Towards Efficient Generative Large Language Model Serving: A Survey from
Algorithms to Systems [14.355768064425598]
generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.
However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency.
This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective.
arXiv Detail & Related papers (2023-12-23T11:57:53Z) - On-device Training: A First Overview on Existing Systems [6.551096686706628]
Efforts have been made to deploy some models on resource-constrained devices as well.
This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - A Survey of Large-Scale Deep Learning Serving System Optimization:
Challenges and Opportunities [24.38071862662089]
Survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc.
arXiv Detail & Related papers (2021-11-28T22:14:10Z) - INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.