Related papers: A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

URL: http://arxiv.org/abs/2405.13019v2
Date: Fri, 24 May 2024 07:40:27 GMT
Title: A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Authors: Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha,
Abstract summary: This paper presents a survey of accelerated generation techniques in autoregressive language models. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods.
Score: 2.091322528026356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing.

Related papers

The Prompt Engineering Report Distilled: Quick Start Guide for Life Sciences [0.016851255229980582]
This report focuses on 6 core prompt techniques: zero-shot, few-shot approaches, thought generation, ensembling, self-criticism, and decomposition.<n>We provide detailed recommendations for how prompts should and shouldn't be structured.<n>We examine the effectiveness of Deep Research tools across OpenAI, Google, Anthropic and Perplexity platforms.
arXiv Detail & Related papers (2025-09-14T14:39:35Z)
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models [51.817121227562964]
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models.<n> Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties.<n>The traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment.
arXiv Detail & Related papers (2025-08-13T14:13:46Z)
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models [71.66119575697458]
parallel text generation techniques aimed at breaking the token-by-token generation bottleneck and improving inference efficiency.<n>We categorize existing approaches into AR-based and Non-AR-based paradigms, and provide a detailed examination of the core techniques within each category.<n>We highlight recent advancements, identify open challenges, and outline promising directions for future research in parallel text generation.
arXiv Detail & Related papers (2025-08-12T07:56:04Z)
A Comprehensive Survey on Continual Learning in Generative Models [35.76314482046672]
We present a comprehensive survey of continual learning methods for mainstream generative models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations [0.0]
Large language models are ubiquitous in natural language processing because they can adapt to new tasks without retraining. This literature review focuses on various techniques for reducing resource requirements and compressing large language models.
arXiv Detail & Related papers (2024-08-06T12:07:32Z)
Exploring the landscape of large language models: Foundations, techniques, and challenges [8.042562891309414]
The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches. It explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learning frameworks. The ethical dimensions of LLM deployment are discussed, underscoring the need for mindful and responsible application.
arXiv Detail & Related papers (2024-04-18T08:01:20Z)
Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models [3.0566617373924325]
Sentence embedding is a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. We propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings.
arXiv Detail & Related papers (2024-04-05T07:07:15Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs. We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications [11.568575664316143]
This paper provides a structured overview of recent advancements in prompt engineering, categorized by application area. We provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.
arXiv Detail & Related papers (2024-02-05T19:49:13Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Improving Factuality and Reasoning in Language Models through Multiagent Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks. Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z)
Self-Supervised Representation Learning: Introduction, Advances and Challenges [125.38214493654534]
Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data.
arXiv Detail & Related papers (2021-10-18T13:51:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.