RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
- URL: http://arxiv.org/abs/2306.03091v2
- Date: Wed, 4 Oct 2023 01:13:49 GMT
- Title: RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
- Authors: Tianyang Liu, Canwen Xu, Julian McAuley
- Abstract summary: RepoBench is a benchmark for evaluating code auto-completion systems.
It consists of three evaluation tasks: RepoBench-R (Retrieval), RepoBench-C (Code Completion), and RepoBench-P (Pipeline)
- Score: 43.797002322559834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have greatly advanced code auto-completion
systems, with a potential for substantial productivity enhancements for
developers. However, current benchmarks mainly focus on single-file tasks,
leaving an assessment gap for more complex, real-world, multi-file programming
scenarios. To fill this gap, we introduce RepoBench, a new benchmark
specifically designed for evaluating repository-level code auto-completion
systems. RepoBench supports both Python and Java and consists of three
interconnected evaluation tasks: RepoBench-R (Retrieval), RepoBench-C (Code
Completion), and RepoBench-P (Pipeline). Each task respectively measures the
system's ability to retrieve the most relevant code snippets from other files
as cross-file context, predict the next line of code with cross-file and
in-file context, and handle complex tasks that require a combination of both
retrieval and next-line prediction. RepoBench aims to facilitate a more
complete comparison of performance and encouraging continuous improvement in
auto-completion systems. RepoBench is publicly available at
https://github.com/Leolty/repobench.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.