CodePlan: Repository-level Coding using LLMs and Planning
- URL: http://arxiv.org/abs/2309.12499v1
- Date: Thu, 21 Sep 2023 21:45:17 GMT
- Title: CodePlan: Repository-level Coding using LLMs and Planning
- Authors: Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D C, Arun
Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, Shashank Shet
- Abstract summary: We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan.
We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python)
Our results show that CodePlan has better match with the ground truth compared to baselines.
- Score: 5.987469779811903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software engineering activities such as package migration, fixing errors
reports from static analysis or testing, and adding type annotations or other
specifications to a codebase, involve pervasively editing the entire repository
of code. We formulate these activities as repository-level coding tasks.
Recent tools like GitHub Copilot, which are powered by Large Language Models
(LLMs), have succeeded in offering high-quality solutions to localized coding
problems. Repository-level coding tasks are more involved and cannot be solved
directly using LLMs, since code within a repository is inter-dependent and the
entire repository may be too large to fit into the prompt. We frame
repository-level coding as a planning problem and present a task-agnostic
framework, called CodePlan to solve it. CodePlan synthesizes a multi-step chain
of edits (plan), where each step results in a call to an LLM on a code location
with context derived from the entire repository, previous code changes and
task-specific instructions. CodePlan is based on a novel combination of an
incremental dependency analysis, a change may-impact analysis and an adaptive
planning algorithm.
We evaluate the effectiveness of CodePlan on two repository-level tasks:
package migration (C#) and temporal code edits (Python). Each task is evaluated
on multiple code repositories, each of which requires inter-dependent changes
to many files (between 2-97 files). Coding tasks of this level of complexity
have not been automated using LLMs before. Our results show that CodePlan has
better match with the ground truth compared to baselines. CodePlan is able to
get 5/6 repositories to pass the validity checks (e.g., to build without errors
and make correct code edits) whereas the baselines (without planning but with
the same type of contextual information as CodePlan) cannot get any of the
repositories to pass them.
Related papers
- CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain.
An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example.
Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z) - Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [24.00351065427465]
We propose a strategy named Hierarchical Context Pruning (HCP) to construct completion prompts with high informational code content.
The HCP models the code repository at the function level, maintaining the topological dependencies between code files while removing a large amount of irrelevant code content.
arXiv Detail & Related papers (2024-06-26T12:26:16Z) - Long Code Arena: a Set of Benchmarks for Long-Context Code Models [75.70507534322336]
Long Code Arena is a suite of six benchmarks for code processing tasks that require project-wide context.
These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization.
For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions.
arXiv Detail & Related papers (2024-06-17T14:58:29Z) - REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark [5.641402231731082]
We introduce RepoExec, a novel benchmark for evaluating code generation at the repository-level scale.
RepoExec focuses on three main aspects: executability, functional correctness through automated test case generation with high coverage rate, and carefully crafted cross-file contexts to accurately generate code.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
We introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions.
We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE)
Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [29.136378191436396]
We present CoCoGen, a new code generation approach that uses compiler feedback to improve the LLM-generated code.
CoCoGen first leverages static analysis to identify mismatches between the generated code and the project's context.
It then iteratively aligns and fixes the identified errors using information extracted from the code repository.
arXiv Detail & Related papers (2024-03-25T14:07:27Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - InstructCoder: Instruction Tuning Large Language Models for Code Editing [26.160498475809266]
We explore the use of Large Language Models (LLMs) to edit code based on user instructions.
InstructCoder is the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing.
Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits.
arXiv Detail & Related papers (2023-10-31T10:15:35Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.