Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
- URL: http://arxiv.org/abs/2505.02309v2
- Date: Thu, 08 May 2025 05:55:48 GMT
- Title: Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
- Authors: Sanjay Surendranath Girija, Shashank Kapoor, Lakshit Arora, Dipen Pradhan, Aman Raj, Ankit Shetgaonkar,
- Abstract summary: Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices.<n>This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments.
- Score: 1.4513830934124627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: Knowledge Distillation, Model Quantization, and Model Pruning. For each technique, we discuss the underlying principles, present different variants, and provide examples of successful applications. We also briefly discuss complementary techniques such as mixture-of-experts and early-exit strategies. Finally, we highlight promising future directions, aiming to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment.
Related papers
- A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities [0.35998666903987897]
This report examines the fine-tuning of Large Language Models (LLMs)
It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI.
The report introduces a structured seven-stage pipeline for fine-tuning LLMs.
arXiv Detail & Related papers (2024-08-23T14:48:02Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models [32.774929826684854]
Large Language Models (LLMs) bring forth challenges in the high consumption of computational, memory, energy, and financial resources.<n>This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs.
arXiv Detail & Related papers (2024-01-01T01:12:42Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - A Survey on Model Compression for Large Language Models [21.768293256849113]
Large Language Models (LLMs) have transformed natural language processing tasks successfully.
Yet, their large size and high computational needs pose challenges for practical use.
Model compression has emerged as a key research area to address these challenges.
arXiv Detail & Related papers (2023-08-15T08:31:05Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [56.5556523013924]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.