Related papers: RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

URL: http://arxiv.org/abs/2403.06095v4
Date: Wed, 14 Aug 2024 16:15:31 GMT
Title: RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion
Authors: Huy N. Phan, Hoang N. Phan, Tien N. Nguyen, Nghi D. Q. Bui,
Abstract summary: tool is a framework designed to address the complex challenges associated with repository-level code completion. Central to tool is the em Repo-level Semantic Graph (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Our evaluations show that tool markedly outperforms existing techniques in repository-level code completion.
Score: 12.173834895070827
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed to address the complex challenges associated with repository-level code completion. Central to RepoHYPER is the {\em Repo-level Semantic Graph} (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Furthermore, RepoHyper leverages Expand and Refine retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets. Our evaluations show that \tool markedly outperforms existing techniques in repository-level code completion, showcasing enhanced accuracy across various datasets when compared to several strong baselines. Our implementation of RepoHYPER can be found at https://github.com/FSoft-AI4Code/RepoHyper.

Related papers

Enhancing Repository-Level Code Generation with Call Chain-Aware Multi-View Context [19.604823896796404]
RepoScope is a call chain-aware multi-view context for repository-level code generation.<n>We present a novel call chain prediction method that utilizes the repository's structural semantics to improve the identification of callees in the target function.<n>RepoScope outperforms state-of-the-art methods, achieving up to a 36.35% relative improvement in pass@1 scores.
arXiv Detail & Related papers (2025-07-20T02:35:36Z)
Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models [56.691926887209895]
We present textbfname (textbfInformation textbfRetrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. name comprises textbften meticulously curated code datasets, spanning textbfeight distinctive retrieval tasks across textbfseven diverse domains. We evaluate nine widely used retrieval models using name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems.
arXiv Detail & Related papers (2024-07-03T07:58:20Z)
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [24.00351065427465]
We propose a strategy named Hierarchical Context Pruning (HCP) to construct completion prompts with high informational code content. The HCP models the code repository at the function level, maintaining the topological dependencies between code files while removing a large amount of irrelevant code content.
arXiv Detail & Related papers (2024-06-26T12:26:16Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present textbfmethodnamews, a novel benchmark designed to evaluate repository-level code generation. We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [30.625128161499195]
GraphCoder is a retrieval-augmented code completion framework. It uses general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. It achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.
arXiv Detail & Related papers (2024-06-11T06:55:32Z)
On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing [82.96523584351314]
We decouple the task of context retrieval from the other components of the repository-level code editing pipelines. We conclude that while the reasoning helps to improve the precision of the gathered context, it still lacks the ability to identify its sufficiency.
arXiv Detail & Related papers (2024-06-06T19:44:17Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion [17.4397495929138]
We propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. Our experiments demonstrate the superior accuracy and applicable efficiency of DraCo, improving code exact match by 3.43% and identifier F1-score by 3.27% on average.
arXiv Detail & Related papers (2024-05-30T07:48:00Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.