RepoScope: Leveraging Call Chain-Aware Multi-View Context for Repository-Level Code Generation
- URL: http://arxiv.org/abs/2507.14791v2
- Date: Sun, 02 Nov 2025 02:45:27 GMT
- Title: RepoScope: Leveraging Call Chain-Aware Multi-View Context for Repository-Level Code Generation
- Authors: Yang Liu, Li Zhang, Fang Liu, Zhuohang Wang, Donglin Wei, Zhishuo Yang, Kechi Zhang, Jia Li, Lin Shi,
- Abstract summary: RepoScope is a call chain-aware multi-view context for repository-level code generation.<n>We present a novel call chain prediction method that utilizes the repository's structural semantics to improve the identification of callees in the target function.<n>RepoScope outperforms state-of-the-art methods, achieving up to a 36.35% relative improvement in pass@1 scores.
- Score: 16.544483144957407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Repository-level code generation aims to generate code within the context of a specified repository. Existing approaches typically employ retrieval-augmented generation (RAG) techniques to provide LLMs with relevant contextual information extracted from the repository. However, these approaches often struggle with effectively identifying truly relevant contexts that capture the rich semantics of the repository, and their contextual perspectives remains narrow. Moreover, most approaches fail to account for the structural relationships in the retrieved code during prompt construction, hindering the LLM's ability to accurately interpret the context. To address these issues, we propose RepoScope, which leverages call chain-aware multi-view context for repository-level code generation. RepoScope constructs a Repository Structural Semantic Graph (RSSG) and retrieves a comprehensive four-view context, integrating both structural and similarity-based contexts. We propose a novel call chain prediction method that utilizes the repository's structural semantics to improve the identification of callees in the target function. Additionally, we present a structure-preserving serialization algorithm for prompt construction, ensuring the coherence of the context for the LLM. Notably, RepoScope relies solely on static analysis, eliminating the need for additional training or multiple LLM queries, thus ensuring both efficiency and generalizability. Evaluation on widely-used repository-level code generation benchmarks (CoderEval and DevEval) demonstrates that RepoScope outperforms state-of-the-art methods, achieving up to a 36.35% relative improvement in pass@1 scores. Further experiments emphasize RepoScope's potential to improve code generation across different tasks and its ability to integrate effectively with existing approaches.
Related papers
- Do Not Treat Code as Natural Language: Implications for Repository-Level Code Generation and Beyond [13.550121154853715]
We present Hydra, a repository-level code generation framework that treats code as structured code rather than natural language.<n>We show that Hydra achieves state-of-the-art performance across open- and closed-source CodeLLMs.
arXiv Detail & Related papers (2026-02-12T07:44:00Z) - AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion [55.21541958868449]
We propose AlignCoder, a repository-level code completion framework.<n>Our framework generates an enhanced query that bridges the semantic gap between the initial query and the target code.<n>We employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval.
arXiv Detail & Related papers (2026-01-27T15:23:14Z) - In Line with Context: Repository-Level Code Generation via Context Inlining [11.065371614078723]
In this paper, we introduce InlineCoder, a novel framework for repository-level code generation.<n>InlineCoder enhances the understanding of repository context by inlining the unfinished function into its call graph.
arXiv Detail & Related papers (2026-01-01T15:56:24Z) - Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding [37.78627994991325]
CoCo is a novel framework that enables code Completion by of multi-granularity context from large-scale code repositories.<n>Experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-12-04T07:37:59Z) - LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z) - CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z) - RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation [78.01030342481246]
RecBase is a domain-agnostic foundational model pretrained with a recommendation-oriented objective.<n>We introduce a unified item tokenizer that encodes items into hierarchical concept identifiers.<n>Our model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.
arXiv Detail & Related papers (2025-09-03T08:33:43Z) - Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs [70.79124435220695]
We propose a novel unified Semantic-enhanced generative Cross-mOdal REtrieval framework (SemCORE)<n>We first construct a Structured natural language IDentifier (SID) that effectively aligns target identifiers with generative models optimized for natural language comprehension and generation.<n>We then introduce a Generative Semantic Verification (GSV) strategy enabling fine-grained target discrimination.
arXiv Detail & Related papers (2025-04-17T17:59:27Z) - Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs [1.4932549821542682]
Existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages.<n>This paper proposes a two-step hierarchical approach for repository-level code summarization, tailored to business applications.
arXiv Detail & Related papers (2025-01-14T05:48:27Z) - Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness.<n>By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization.<n>Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing [82.96523584351314]
We decouple the task of context retrieval from the other components of the repository-level code editing pipelines.
We conclude that while the reasoning helps to improve the precision of the gathered context, it still lacks the ability to identify its sufficiency.
arXiv Detail & Related papers (2024-06-06T19:44:17Z) - Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [4.767858874370881]
We introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating class-level code within real-world repositories.
RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories.
We introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context.
arXiv Detail & Related papers (2024-04-22T03:52:54Z) - Repoformer: Selective Retrieval for Repository-Level Code Completion [30.706277772743615]
Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion.
In this paper, we propose a selective RAG framework to avoid retrieval when unnecessary.
We show that our framework is able to accommodate different generation models, retrievers, and programming languages.
arXiv Detail & Related papers (2024-03-15T06:59:43Z) - RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [12.173834895070827]
tool is a framework designed to address the complex challenges associated with repository-level code completion.
Central to tool is the em Repo-level Semantic Graph (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories.
Our evaluations show that tool markedly outperforms existing techniques in repository-level code completion.
arXiv Detail & Related papers (2024-03-10T05:10:34Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.