Linking Code and Documentation Churn: Preliminary Analysis
- URL: http://arxiv.org/abs/2410.05992v1
- Date: Tue, 8 Oct 2024 12:41:58 GMT
- Title: Linking Code and Documentation Churn: Preliminary Analysis
- Authors: Ani Hovhannisyan, Youmei Fan, Gema Rodriguez-Perez, Raula Gaikovina Kula,
- Abstract summary: This study investigates the synchrony between code churn and documentation updates in three GitHub open-source projects.
Preliminary results indicate varying degrees of synchrony across projects, highlighting the importance of integrated concurrent documentation practices.
The novelty of this study lies in demonstrating how synchronizing code changes with documentation updates can improve the development lifecycle by enhancing diversity and efficiency.
- Score: 2.033674689332928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code churn refers to the measure of the amount of code added, modified, or deleted in a project and is often used to assess codebase stability and maintainability. Program comprehension or how understandable the changes are, is equally important for maintainability. Documentation is crucial for knowledge transfer, especially when new maintainers take over abandoned code. We emphasize the need for corresponding documentation updates, as this reflects project health and trustworthiness as a third-party library. Therefore, we argue that every code change should prompt a documentation update (defined as documentation churn). Linking code churn changes with documentation updates is important for project sustainability, as it facilitates knowledge transfer and reduces the effort required for program comprehension. This study investigates the synchrony between code churn and documentation updates in three GitHub open-source projects. We will use qualitative analysis and repository mining to examine the alignment and correlation of code churn and documentation updates over time. We want to identify which code changes are likely synchronized with documentation and to what extent documentation can be auto-generated. Preliminary results indicate varying degrees of synchrony across projects, highlighting the importance of integrated concurrent documentation practices and providing insights into how recent technologies like AI, in the form of Large Language Models (i.e., LLMs), could be leveraged to keep code and documentation churn in sync. The novelty of this study lies in demonstrating how synchronizing code changes with documentation updates can improve the development lifecycle by enhancing diversity and efficiency.
Related papers
- Supporting Software Maintenance with Dynamically Generated Document Hierarchies [41.407915858583344]
We present HGEN, a fully automated pipeline that transforms source code through a series of six stages into a well-organized hierarchy of formatted documents.
We evaluate HGEN both quantitatively and qualitatively.
Results show that HGEN produces artifact hierarchies similar in quality to manually constructed documentation, with much higher coverage of the core concepts than the baseline approach.
arXiv Detail & Related papers (2024-08-11T17:11:14Z) - Code Documentation and Analysis to Secure Software Development [0.0]
CoDAT is a tool designed to maintain consistency between the various levels of code documentation.
It is implemented in the Intellij IDEA.
We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it.
arXiv Detail & Related papers (2024-07-16T17:25:44Z) - CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain.
An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example.
Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware [13.27883339389175]
We propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors.
Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code.
arXiv Detail & Related papers (2023-12-10T05:36:06Z) - CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code
Completion [86.01508183157613]
CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages.
We show that CrossCodeEval is extremely challenging when the relevant cross-file context is absent.
We also show that CrossCodeEval can also be used to measure the capability of code retrievers.
arXiv Detail & Related papers (2023-10-17T13:18:01Z) - Wait, wasn't that code here before? Detecting Outdated Software
Documentation [9.45052138795667]
We present a GitHub Actions tool that automatically scans for outdated code element references.
More than a quarter of the 1000 most popular projects on GitHub contained at least one outdated reference.
arXiv Detail & Related papers (2023-07-10T00:52:29Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.