RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation
- URL: http://arxiv.org/abs/2303.12570v3
- Date: Fri, 20 Oct 2023 15:21:51 GMT
- Title: RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation
- Authors: Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan,
Yi Mao, Jian-Guang Lou, Weizhu Chen
- Abstract summary: RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
- Score: 96.75695811963242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of repository-level code completion is to continue writing the
unfinished code based on a broader context of the repository. While for
automated code completion tools, it is difficult to utilize the useful
information scattered in different files. We propose RepoCoder, a simple,
generic, and effective framework to address the challenge. It streamlines the
repository-level code completion process by incorporating a similarity-based
retriever and a pre-trained code language model in an iterative
retrieval-generation pipeline. RepoCoder makes effective utilization of
repository-level information for code completion and has the ability to
generate code at various levels of granularity. Moreover, we propose a new
benchmark RepoEval, which consists of the latest and high-quality real-world
repositories covering line, API invocation, and function body completion
scenarios. Experimental results indicate that RepoCoder significantly improves
the In-File completion baseline by over 10% in all settings and consistently
outperforms the vanilla retrieval-augmented code completion approach.
Furthermore, we validate the effectiveness of RepoCoder through comprehensive
analysis, providing valuable insights for future research. Our source code and
benchmark are publicly available:
https://github.com/microsoft/CodeT/tree/main/RepoCoder
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [7.167252304211614]
RepoGenReflex is a generic, dynamic, effective framework to optimize the retrieval and generation process dynamically.
By leveraging the Retrieval-Augmented Generation (RAG) enhanced with Verbal Reinforcement Learning (VRL), it can dynamically choose the optimal results for repository-level code completion.
RepoGenReflex consistently demonstrates superior performance and effectiveness across standard code completion tasks.
arXiv Detail & Related papers (2024-09-19T23:38:59Z) - Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [24.00351065427465]
We propose a strategy named Hierarchical Context Pruning (HCP) to construct completion prompts with high informational code content.
The HCP models the code repository at the function level, maintaining the topological dependencies between code files while removing a large amount of irrelevant code content.
arXiv Detail & Related papers (2024-06-26T12:26:16Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [30.625128161499195]
GraphCoder is a retrieval-augmented code completion framework.
It uses general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process.
It achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.
arXiv Detail & Related papers (2024-06-11T06:55:32Z) - RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [12.173834895070827]
tool is a framework designed to address the complex challenges associated with repository-level code completion.
Central to tool is the em Repo-level Semantic Graph (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories.
Our evaluations show that tool markedly outperforms existing techniques in repository-level code completion.
arXiv Detail & Related papers (2024-03-10T05:10:34Z) - LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens.
Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction.
memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z) - Generation-Augmented Query Expansion For Code Retrieval [51.20943646688115]
We propose a generation-augmented query expansion framework.
Inspired by the human retrieval process - sketching an answer before searching.
We achieve new state-of-the-art results on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-12-20T23:49:37Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.