MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
- URL: http://arxiv.org/abs/2502.15872v1
- Date: Fri, 21 Feb 2025 18:58:17 GMT
- Title: MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
- Authors: Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta,
- Abstract summary: MutaGReP is an approach to search for plans that decompose a user request into natural language steps grounded in a large code repository.<n>Our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo.
- Score: 92.28400093066212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM? One approach is to add the entire repo to the LLM's context window. However, most tasks involve only fraction of symbols from a repo, longer contexts are detrimental to the LLM's reasoning abilities, and context windows are not unlimited. Alternatively, we could emulate the human ability to navigate a large repo, pick out the right functionality, and form a plan to solve the task. We propose MutaGReP (Mutation-guided Grounded Repository Plan Search), an approach to search for plans that decompose a user request into natural language steps grounded in the codebase. MutaGReP performs neural tree search in plan space, exploring by mutating plans and using a symbol retriever for grounding. On the challenging LongCodeArena benchmark, our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo. Plans produced by MutaGReP allow Qwen 2.5 Coder 32B and 72B to match the performance of GPT-4o with full repo context and enable progress on the hardest LongCodeArena tasks. Project page: zaidkhan.me/MutaGReP
Related papers
- CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [69.684886175768]
Large language models (LLMs) have shown promising performance in automated code generation.
In this paper, we propose CodeRAG, a retrieval-augmented code generation framework.
Experiments show that CodeRAG achieves significant improvements compared to no RAG scenarios.
arXiv Detail & Related papers (2025-04-14T09:51:23Z) - Repository-level Code Search with Neural Retrieval Methods [25.222964965449286]
We define the task of repository-level code search as retrieving the set of files from the current state of a code repository that are most relevant to addressing a user's question or bug.
The proposed approach combines BM25-based retrieval over commit messages with neural reranking using CodeBERT to identify the most pertinent files.
Experiments on a new dataset created from 7 popular open-source repositories demonstrate substantial improvements of up to 80% in MAP, MRR and P@1 over the BM25 baseline.
arXiv Detail & Related papers (2025-02-10T21:59:01Z) - Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [24.00351065427465]
We propose a strategy named Hierarchical Context Pruning (HCP) to construct completion prompts with high informational code content.
The HCP models the code repository at the function level, maintaining the topological dependencies between code files while removing a large amount of irrelevant code content.
arXiv Detail & Related papers (2024-06-26T12:26:16Z) - RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [12.173834895070827]
tool is a framework designed to address the complex challenges associated with repository-level code completion.
Central to tool is the em Repo-level Semantic Graph (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories.
Our evaluations show that tool markedly outperforms existing techniques in repository-level code completion.
arXiv Detail & Related papers (2024-03-10T05:10:34Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently.
Retrieval-augmentation versus long context window, which one is better for downstream tasks?
Can both methods be combined to get the best of both worlds?
Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z) - CodePlan: Repository-level Coding using LLMs and Planning [5.987469779811903]
We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan.
We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python)
Our results show that CodePlan has better match with the ground truth compared to baselines.
arXiv Detail & Related papers (2023-09-21T21:45:17Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Repository-Level Prompt Generation for Large Language Models of Code [28.98699307030983]
We propose a framework that learns to generate example-specific prompts using prompt proposals.
The prompt proposals take context from the entire repository.
We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives.
arXiv Detail & Related papers (2022-06-26T10:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.