LLM Inference Serving: Survey of Recent Advances and Opportunities
- URL: http://arxiv.org/abs/2407.12391v1
- Date: Wed, 17 Jul 2024 08:11:47 GMT
- Title: LLM Inference Serving: Survey of Recent Advances and Opportunities
- Authors: Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari,
- Abstract summary: This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems.
We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms.
This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.
- Score: 8.567865555551911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.
Related papers
- A Systematic Survey on Large Language Models for Algorithm Design [25.556342145274613]
Algorithm Design (AD) is crucial for effective problem-solving across various domains.
The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this field.
arXiv Detail & Related papers (2024-10-11T13:17:19Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward [29.81212051279456]
Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference.
This survey offers an overview of these methods, emphasizing recent developments.
arXiv Detail & Related papers (2024-02-02T06:29:34Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Large Language Models Meet Computer Vision: A Brief Survey [0.0]
Large Language Models (LLMs) and Computer Vision (CV) have emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI)
This survey paper delves into the latest progressions in the domain of transformers, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs.
The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development.
arXiv Detail & Related papers (2023-11-28T10:39:19Z) - Online Advertisements with LLMs: Opportunities and Challenges [51.96140910798771]
This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems.
We introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules.
arXiv Detail & Related papers (2023-11-11T02:13:32Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - PRISMA-DFLLM: An Extension of PRISMA for Systematic Literature Reviews
using Domain-specific Finetuned Large Language Models [0.0]
This paper proposes an AI-enabled methodological framework that combines the power of Large Language Models (LLMs) with the rigorous reporting guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)
By finetuning LLMs on domain-specific academic papers that have been selected as a result of a rigorous SLR process, the proposed PRISMA-DFLLM reporting guidelines offer the potential to achieve greater efficiency, reusability and scalability.
arXiv Detail & Related papers (2023-06-15T02:52:50Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.