REPOFUSE: Repository-Level Code Completion with Fused Dual Context
- URL: http://arxiv.org/abs/2402.14323v2
- Date: Fri, 23 Feb 2024 02:53:20 GMT
- Title: REPOFUSE: Repository-Level Code Completion with Fused Dual Context
- Authors: Ming Liang, Xiaoheng Xie, Gehao Zhang, Xunjin Zheng, Peng Di, wei
jiang, Hongwei Chen, Chengpeng Wang, Gang Fan
- Abstract summary: This paper introduces REPOFUSE, a pioneering solution designed to enhance repository-level code completion without the latency trade-off.
We propose a novel rank truncated generation (RTG) technique that efficiently condenses two types of context into prompts with restricted size.
REPOFUSE has demonstrated a significant leap over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed.
- Score: 11.531678717514724
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The success of language models in code assistance has spurred the proposal of
repository-level code completion as a means to enhance prediction accuracy,
utilizing the context from the entire codebase. However, this amplified context
can inadvertently increase inference latency, potentially undermining the
developer experience and deterring tool adoption - a challenge we termed the
Context-Latency Conundrum. This paper introduces REPOFUSE, a pioneering
solution designed to enhance repository-level code completion without the
latency trade-off. REPOFUSE uniquely fuses two types of context: the analogy
context, rooted in code analogies, and the rationale context, which encompasses
in-depth semantic relationships. We propose a novel rank truncated generation
(RTG) technique that efficiently condenses these contexts into prompts with
restricted size. This enables REPOFUSE to deliver precise code completions
while maintaining inference efficiency. Through testing with the CrossCodeEval
suite, REPOFUSE has demonstrated a significant leap over existing models,
achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code
completions and a 26.8% enhancement in inference speed. Beyond experimental
validation, REPOFUSE has been integrated into the workflow of a large
enterprise, where it actively supports various coding tasks.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - LLM Agents Improve Semantic Code Search [6.047454623201181]
We introduce the approach of using Retrieval Augmented Generation powered agents to inject information into user prompts.
By utilizing RAG, agents enhance user queries with relevant details from GitHub repositories, making them more informative and contextually aligned.
Experimental results on the CodeSearchNet dataset demonstrate that RepoRift significantly outperforms existing methods.
arXiv Detail & Related papers (2024-08-05T00:43:56Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present textbfmethodnamews, a novel benchmark designed to evaluate repository-level code generation.
We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing [82.96523584351314]
We decouple the task of context retrieval from the other components of the repository-level code editing pipelines.
We conclude that while the reasoning helps to improve the precision of the gathered context, it still lacks the ability to identify its sufficiency.
arXiv Detail & Related papers (2024-06-06T19:44:17Z) - Repoformer: Selective Retrieval for Repository-Level Code Completion [30.706277772743615]
Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion.
In this paper, we propose a selective RAG framework to avoid retrieval when unnecessary.
We show that our framework is able to accommodate different generation models, retrievers, and programming languages.
arXiv Detail & Related papers (2024-03-15T06:59:43Z) - IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code
Completion [38.863871578280936]
We propose IRCoCo, a code completion-specific DRL-based fine-tuning framework.
We show that fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the code completion task.
arXiv Detail & Related papers (2024-01-30T00:18:20Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.