FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification
- URL: http://arxiv.org/abs/2502.17139v2
- Date: Tue, 30 Sep 2025 09:10:26 GMT
- Title: FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification
- Authors: Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Jia Li, Lin Shi,
- Abstract summary: We propose FastCoder, an inference acceleration approach specifically designed for code generation.<n>FastCoder constructs a multi-source datastore, providing access to both general and project-specific knowledge.<n>It can reach up to 2.53x and 2.54x speedup compared to autoregressive decoding in repository-level and standalone code generation tasks.
- Score: 10.286072352686874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code generation is a latency-sensitive task that demands high timeliness. However, with the growing interest and inherent difficulty in repository-level code generation, most existing code generation studies focus on improving the correctness of generated code while overlooking the inference efficiency, which is substantially affected by the overhead during LLM generation. Although there has been work on accelerating LLM inference, these approaches are not tailored to the specific characteristics of code generation; instead, they treat code the same as natural language sequences and ignore its unique syntax and semantic characteristics, which are also crucial for improving efficiency. Consequently, these approaches exhibit limited effectiveness in code generation tasks, particularly for repository-level scenarios with considerable complexity and difficulty. To alleviate this issue, following draft-verification paradigm, we propose FastCoder, a simple yet highly efficient inference acceleration approach specifically designed for code generation, without compromising the quality of the output. FastCoder constructs a multi-source datastore, providing access to both general and project-specific knowledge, facilitating the retrieval of high-quality draft sequences. Moreover, FastCoder reduces the retrieval cost by controlling retrieval timing, and enhances efficiency through parallel retrieval and a context- and LLM preference-aware cache. Experimental results show that FastCoder can reach up to 2.53x and 2.54x speedup compared to autoregressive decoding in repository-level and standalone code generation tasks, respectively, outperforming state-of-the-art inference acceleration approaches by up to 88%. FastCoder can also be integrated with existing correctness-focused code generation approaches to accelerate the LLM generation process, and reach a speedup exceeding 2.6x.
Related papers
- AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion [55.21541958868449]
We propose AlignCoder, a repository-level code completion framework.<n>Our framework generates an enhanced query that bridges the semantic gap between the initial query and the target code.<n>We employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval.
arXiv Detail & Related papers (2026-01-27T15:23:14Z) - ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation [27.9837392531619]
We propose a knowledge-infused framework named ShortCoder to optimize code generation efficiency.<n>ShortCoder consistently outperforms state-of-the-art methods on HumanEval.
arXiv Detail & Related papers (2026-01-14T18:57:31Z) - Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model [98.35868970993232]
Diffusion language models (DLMs) are emerging as a powerful and promising alternative to the dominant autoregressive paradigm.<n>We introduce efficient Sampling with Adaptive acceleration and Backtracking Enhanced Remasking (i.e., Saber) to achieve better inference speed and output quality in code generation.
arXiv Detail & Related papers (2025-10-20T23:38:12Z) - Accelerating Diffusion LLM Inference via Local Determinism Propagation [27.751279909685604]
LocalLeap is a training-free adaptive parallel decoding strategy.<n>It achieves 6.94$times$ throughput improvements and reduces decoding steps to just 14.2% of the original requirement.
arXiv Detail & Related papers (2025-10-08T14:39:34Z) - Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs [57.69190972274813]
Diffusion Large Language Models (DLLMs) have emerged as a compelling alternative to Autoregressive models.<n>ExistingDLLMs are plagued by a severe quality-speed trade-off, where faster parallel decoding leads to significant performance degradation.<n>We introduce Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable decoding inDLLMs.
arXiv Detail & Related papers (2025-07-24T16:51:33Z) - AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism [17.858104076062897]
Large language models (LLMs) are increasingly used for long-content generation.<n>We propose AdaDecode, which accelerates decoding without requiring auxiliary models or changes to the original model parameters.<n>AdaDecode consistently achieves superior decoding throughput with up to 1.73x speedup.
arXiv Detail & Related papers (2025-06-04T08:32:30Z) - Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding [13.813068175559456]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code editing, substantially enhancing software development productivity.<n>We propose EfficientEdit, a novel method that improves LLM-based code editing efficiency through two key mechanisms based on speculative decoding.<n> Experimental results show that EfficientEdit can achieve up to 10.38$times$ and 13.09$times$ speedup compared to standard autoregressive decoding.
arXiv Detail & Related papers (2025-06-03T12:01:20Z) - FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge [60.000984252907195]
Auto-regressive (AR) models have recently shown promise in visual generation tasks due to their superior sampling efficiency.<n>Video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase.<n>We propose the textbfFastCar framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy.
arXiv Detail & Related papers (2025-05-17T05:00:39Z) - LoRACode: LoRA Adapters for Code Embeddings [1.5525560291268214]
We introduce a parameter-efficient fine-tuning method based on Low-Rank Adaptation (LoRA) to construct task-specific adapters for code retrieval.
Our approach reduces the number of trainable parameters to less than two percent of the base model, enabling rapid fine-tuning on extensive code corpora.
arXiv Detail & Related papers (2025-03-07T10:50:45Z) - LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification [42.54363549922909]
Speculative decoding has become a promising technique to mitigate the high inference latency of autoregressive decoding in Large Language Models.
Despite its promise, the effective application of speculative decoding in LLMs still confronts three key challenges.
We enhance the performance of speculative decoding in long-context settings by addressing these challenges.
arXiv Detail & Related papers (2025-02-24T18:53:31Z) - LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness [38.399282089600284]
Large Language Models (LLMs) have demonstrated impressive performance in code generation.
tool: ulineLarge ulineLanguage ulineModel for Code ulineEfficiency is a novel framework that enables LLMs to generate code that balances both efficiency and correctness.
arXiv Detail & Related papers (2025-02-17T07:01:18Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.
We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - Effi-Code: Unleashing Code Efficiency in Language Models [17.355845751737423]
Effi-Code is an approach to enhancing code generation in large language models.
Effi-Code offers a scalable and generalizable approach to improving code generation in AI systems.
arXiv Detail & Related papers (2024-10-14T07:05:51Z) - Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion [55.0194604505437]
Speculative decoding has emerged as a widely adopted method to accelerate large language model inference.<n>This paper proposes an adaptation of speculative decoding which uses discrete diffusion models to generate draft sequences.
arXiv Detail & Related papers (2024-08-10T21:24:25Z) - When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention [43.39584272739589]
We introduce CodeFast, an inference acceleration approach for Code LLMs on code generation.
Key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected.
We conduct extensive experiments with CodeFast on five representative Code LLMs across four widely used code generation datasets.
arXiv Detail & Related papers (2024-07-29T14:27:08Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Exploring Data-Efficient Adaptation of Large Language Models for Code Generation [64.5583894165813]
We propose a novel adaptation approach named DEED, which stands for Data-Efficient adaptation with Error-Driven learning for code generation.<n> Experimental results show that, compared to other mainstream fine-tuning approaches, DEED achieves superior performance with few training data.
arXiv Detail & Related papers (2024-02-29T16:09:02Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Fixing Large Language Models' Specification Misunderstanding for Better Code Generation [13.494822086550604]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)<n>It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.<n>muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.