RepoSummary: Feature-Oriented Summarization and Documentation Generation for Code Repositories
- URL: http://arxiv.org/abs/2510.11039v1
- Date: Mon, 13 Oct 2025 06:16:44 GMT
- Title: RepoSummary: Feature-Oriented Summarization and Documentation Generation for Code Repositories
- Authors: Yifeng Zhu, Xianlin Zhao, Xutian Li, Yanzhen Zou, Haizhuo Yuan, Yue Wang, Bing Xie,
- Abstract summary: RepoSummary is a feature-oriented code repository summarization approach.<n>It simultaneously generates repository documentation automatically.<n>It establishes more accurate traceability links from functional features to the corresponding code elements.
- Score: 7.744086870383438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Repository summarization is a crucial research question in development and maintenance for software engineering. Existing repository summarization techniques primarily focus on summarizing code according to the directory tree, which is insufficient for tracing high-level features to the methods that collaboratively implement them. To address these limitations, we propose RepoSummary, a feature-oriented code repository summarization approach that simultaneously generates repository documentation automatically. Furthermore, it establishes more accurate traceability links from functional features to the corresponding code elements, enabling developers to rapidly locate relevant methods and files during code comprehension and maintenance. Comprehensive experiments against the state-of-the-art baseline (HGEN) demonstrate that RepoSummary achieves higher feature coverage and more accurate traceability. On average, it increases the rate of completely covered features in manual documentation from 61.2% to 71.1%, improves file-level traceability recall from 29.9% to 53.0%, and generates documentation that is more conceptually consistent, easier to understand, and better formatted than that produced by existing approaches.
Related papers
- What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction [57.86097956633207]
method is a graph-based agent framework for generating executable code from academic papers.<n>On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, method achieves an average performance gap of 10.04% against official implementations.
arXiv Detail & Related papers (2026-03-02T12:33:31Z) - In Line with Context: Repository-Level Code Generation via Context Inlining [11.065371614078723]
In this paper, we introduce InlineCoder, a novel framework for repository-level code generation.<n>InlineCoder enhances the understanding of repository context by inlining the unfinished function into its call graph.
arXiv Detail & Related papers (2026-01-01T15:56:24Z) - CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases [7.75137961900221]
We present bftextCodeWiki, a unified framework for automated repository-level documentation across seven programming languages.<n>CodeWiki introduces three key innovations: (i) hierarchical decomposition that preserves architectural context across multiple levels of granularity, (ii) recursive multi-agent processing with dynamic task delegation for scalable generation, and (iii) multi-modal synthesis that integrates textual descriptions with visual artifacts such as architecture diagrams and data-flow representations.<n>CodeWiki achieves a 68.79% quality score with proprietary models, outperforming the closed-source DeepWiki baseline (64.06%) by 4.73%
arXiv Detail & Related papers (2025-10-28T13:52:46Z) - Learning Refined Document Representations for Dense Retrieval via Deliberate Thinking [58.69615583599489]
Deliberate Thinking based Retriever (Debater) is a novel approach that enhances document representations by incorporating a step-by-step thinking process.<n>Debater significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z) - ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - EpiCoder: Encompassing Diversity and Complexity in Code Generation [66.43738008739555]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [12.173834895070827]
tool is a framework designed to address the complex challenges associated with repository-level code completion.
Central to tool is the em Repo-level Semantic Graph (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories.
Our evaluations show that tool markedly outperforms existing techniques in repository-level code completion.
arXiv Detail & Related papers (2024-03-10T05:10:34Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.