A Survey on Model Compression for Large Language Models
- URL: http://arxiv.org/abs/2308.07633v3
- Date: Sun, 17 Sep 2023 16:38:18 GMT
- Title: A Survey on Model Compression for Large Language Models
- Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang
- Abstract summary: Large Language Models (LLMs) have revolutionized natural language processing tasks with remarkable success.
Their formidable size and computational demands present significant challenges for practical deployment.
The field of model compression has emerged as a pivotal research area to alleviate these limitations.
- Score: 23.354025348567077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have revolutionized natural language processing
tasks with remarkable success. However, their formidable size and computational
demands present significant challenges for practical deployment, especially in
resource-constrained environments. As these challenges become increasingly
pertinent, the field of model compression has emerged as a pivotal research
area to alleviate these limitations. This paper presents a comprehensive survey
that navigates the landscape of model compression techniques tailored
specifically for LLMs. Addressing the imperative need for efficient deployment,
we delve into various methodologies, encompassing quantization, pruning,
knowledge distillation, and more. Within each of these techniques, we highlight
recent advancements and innovative approaches that contribute to the evolving
landscape of LLM research. Furthermore, we explore benchmarking strategies and
evaluation metrics that are essential for assessing the effectiveness of
compressed LLMs. By providing insights into the latest developments and
practical implications, this survey serves as an invaluable resource for both
researchers and practitioners. As LLMs continue to evolve, this survey aims to
facilitate enhanced efficiency and real-world applicability, establishing a
foundation for future advancements in the field.
Related papers
- Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application [21.555902498178387]
Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry.
The endeavor to compress language models while maintaining their accuracy has become a focal point of research.
Knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance.
arXiv Detail & Related papers (2024-07-02T02:14:42Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - A Survey of Resource-efficient LLM and Multimodal Foundation Models [22.60868015887625]
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and multimodal models, are revolutionizing the entire machine learning lifecycle.
However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources.
This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects.
arXiv Detail & Related papers (2024-01-16T03:35:26Z) - Beyond Efficiency: A Systematic Survey of Resource-Efficient Large
Language Models [34.327846901536425]
Large Language Models (LLMs) bring forth challenges in the high consumption of computational, memory, energy, and financial resources.
This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs.
arXiv Detail & Related papers (2024-01-01T01:12:42Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - A Study on the Implementation of Generative AI Services Using an
Enterprise Data-Based LLM Application Architecture [0.0]
This study presents a method for implementing generative AI services by utilizing the Large Language Models (LLM) application architecture.
The research delves into strategies for mitigating the issue of inadequate data, offering tailored solutions.
A significant contribution of this work is the development of a Retrieval-Augmented Generation (RAG) model.
arXiv Detail & Related papers (2023-09-03T07:03:17Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [60.67550275379953]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.