Related papers: A Survey of Serverless Machine Learning Model Inference

A Survey of Serverless Machine Learning Model Inference

URL: http://arxiv.org/abs/2311.13587v1
Date: Wed, 22 Nov 2023 18:46:05 GMT
Title: A Survey of Serverless Machine Learning Model Inference
Authors: Kamil Kojs
Abstract summary: Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. By providing a novel taxonomy and summarizing recent trends, we hope that this survey could shed light on new optimization perspectives and motivate novel works in large-scale deep learning serving systems.

Related papers

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
Exploration of VLMs for Driver Monitoring Systems Applications [3.59361692183907]
In recent years, we have witnessed significant progress in emerging deep learning models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs) This paper presents our initial approach to implementing VLMs in Driver Monitoring Systems (DMS)
arXiv Detail & Related papers (2025-03-15T22:37:36Z)
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models [16.250856588632637]
The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence. These models are increasingly integrated into diverse applications, impacting both research and industry. This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models.
arXiv Detail & Related papers (2024-10-08T21:46:52Z)
AI Foundation Models in Remote Sensing: A Survey [6.036426846159163]
This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their applications in computer vision and domain-specific tasks. We highlight emerging trends and the significant advancements achieved by these foundation models.
arXiv Detail & Related papers (2024-08-06T22:39:34Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems [14.355768064425598]
generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective.
arXiv Detail & Related papers (2023-12-23T11:57:53Z)
On-device Training: A First Overview on Existing Systems [6.551096686706628]
Efforts have been made to deploy some models on resource-constrained devices as well. This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z)
Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities [24.38071862662089]
Survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc.
arXiv Detail & Related papers (2021-11-28T22:14:10Z)
INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.