LLM Inference Serving: Survey of Recent Advances and Opportunities
- URL: http://arxiv.org/abs/2407.12391v1
- Date: Wed, 17 Jul 2024 08:11:47 GMT
- Title: LLM Inference Serving: Survey of Recent Advances and Opportunities
- Authors: Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari,
- Abstract summary: This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems.
We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms.
This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.
- Score: 8.567865555551911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.
Related papers
- Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation [85.52251362906418]
This tutorial explores two primary approaches for integrating large language models (LLMs)
It provides a comprehensive overview of generative large recommendation models, including their recent advancements, challenges, and potential research directions.
Key topics include data quality, scaling laws, user behavior mining, and efficiency in training and inference.
arXiv Detail & Related papers (2025-02-19T14:48:25Z) - From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations.
This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z) - Large Language Model Enhanced Recommender Systems: Taxonomy, Trend, Application and Future [31.31030891846837]
This paper presents a survey of the latest research efforts aimed at leveraging Large Language Model (LLM) to enhance recommender systems (RS)
We identify a critical shift in the field with the move towards incorporating LLM into the online system, notably by avoiding their use during inference.
arXiv Detail & Related papers (2024-12-18T02:07:21Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward [29.81212051279456]
Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference.
This survey offers an overview of these methods, emphasizing recent developments.
arXiv Detail & Related papers (2024-02-02T06:29:34Z) - Large Language Models Meet Computer Vision: A Brief Survey [0.0]
Large Language Models (LLMs) and Computer Vision (CV) have emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI)
This survey paper delves into the latest progressions in the domain of transformers, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs.
The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development.
arXiv Detail & Related papers (2023-11-28T10:39:19Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.