Establishing Traceability Links between Release Notes & Software Artifacts: Practitioners' Perspectives
- URL: http://arxiv.org/abs/2511.18187v1
- Date: Sat, 22 Nov 2025 20:45:24 GMT
- Title: Establishing Traceability Links between Release Notes & Software Artifacts: Practitioners' Perspectives
- Authors: Sristy Sumana Nath, Banani Roy, Munima Jahan,
- Abstract summary: In open-source environments where contributors work remotely and asynchronously, establishing and maintaining traceability links is often error-prone.<n>Our empirical study of GitHub repositories revealed that 47% of release artifacts lacked traceability links, and 12% contained broken links.<n>We implemented LLM-based approaches to automatically establish traceability links of three pairs between release note contents & PRs, release note contents & PRs and release note contents & issues.
- Score: 5.70062525101025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maintaining traceability links between software release notes and corresponding development artifacts, e.g., pull requests (PRs), commits, and issues, is essential for managing technical debt and ensuring maintainability. However, in open-source environments where contributors work remotely and asynchronously, establishing and maintaining these links is often error-prone, time-consuming, and frequently overlooked. Our empirical study of GitHub repositories revealed that 47% of release artifacts lacked traceability links, and 12% contained broken links. To address this gap, we first analyzed release notes to identify their What, Why, and How information and assessed how these align with PRs, commits, and issues. We curated a benchmark dataset consisting of 3,500 filtered and validated traceability link instances. Then, we implemented LLM-based approaches to automatically establish traceability links of three pairs between release note contents & PRs, release note contents & PRs and release note contents & issues. By combining the time proximity feature, the LLM-based approach, e.g., Gemini 1.5 Pro, achieved a high Precision@1 value of 0.73 for PR traceability recovery. To evaluate the usability and adoption potential of this approach, we conducted an online survey involving 33 open-source practitioners. 16% of respondents rated as very important, and 68% as somewhat important for traceability maintenance.
Related papers
- What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction [57.86097956633207]
method is a graph-based agent framework for generating executable code from academic papers.<n>On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, method achieves an average performance gap of 10.04% against official implementations.
arXiv Detail & Related papers (2026-03-02T12:33:31Z) - Forecasting the Maintained Score from the OpenSSF Scorecard for GitHub Repositories linked to PyPI libraries [78.48200143057376]
We study to what extent future maintenance activity, as captured by the OpenSSF maintained score, can be forecasted.<n>We analyze 3,220 GitHub repositories associated with the top 1% most central PyPI libraries by PageRank.<n>Our results show that future maintenance activity can be predicted with meaningful accuracy.
arXiv Detail & Related papers (2026-01-26T10:32:54Z) - Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them [58.04324690859212]
Large Language Models (LLMs) as automated evaluators (LLM-as-a-judge) has revealed critical inconsistencies in current evaluation frameworks.<n>We identify two fundamental types of inconsistencies: Score-Comparison Inconsistency and Pairwise Transitivity Inconsistency.<n>We propose TrustJudge, a probabilistic framework that addresses these limitations through two key innovations.
arXiv Detail & Related papers (2025-09-25T13:04:29Z) - Extracting Conceptual Knowledge to Locate Software Issues [12.746044344302623]
RepoLens is a novel approach that abstracts and leverages conceptual knowledge from code repositories.<n>It operates in two stages: an offline stage that extracts conceptual knowledge into a repository-wide knowledge base, and an online stage that retrieves issue-specific terms.<n>RepoLens consistently improves three state-of-the-art tools, achieving average gains of over 22% in Hit@k and 46% in Recall@k for file- and function-level localization.
arXiv Detail & Related papers (2025-09-25T11:53:06Z) - Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval [11.976686179876777]
Issue-commit linking, which connects issues with commits that fix them, is crucial for software maintenance.<n>We propose EasyLink, which utilizes a vector database as a modern Information Retrieval technique.<n>Under our evaluation, EasyLink achieves an average Precision@1 of 75.03%, improving over the state-of-the-art by over four times.
arXiv Detail & Related papers (2025-07-12T08:42:10Z) - Evaluating the Use of LLMs for Documentation to Code Traceability [3.076436880934678]
Large Language Models can establish trace links between various software documentation and source code.<n>We create two novel datasets from two open-source projects (Unity Catalog and Crawl4AI)<n>Results show that the best-performing LLM achieves F1-scores of 79.4% and 80.4% across the two datasets.
arXiv Detail & Related papers (2025-06-19T16:18:53Z) - Document Attribution: Examining Citation Relationships using Large Language Models [62.46146670035751]
We propose a zero-shot approach that frames attribution as a straightforward textual entailment task.<n>We also explore the role of the attention mechanism in enhancing the attribution process.
arXiv Detail & Related papers (2025-05-09T04:40:11Z) - TRIAD: Automated Traceability Recovery based on Biterm-enhanced
Deduction of Transitive Links among Artifacts [53.92293118080274]
Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle.
Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR)
arXiv Detail & Related papers (2023-12-28T06:44:24Z) - Improving Trace Link Recommendation by Using Non-Isotropic Distances and
Combinations [0.799536002595393]
We study non-linear similarity measures for computing trace links.
We evaluated our observations on a dataset of four open source projects and two industrial projects.
arXiv Detail & Related papers (2023-07-15T11:35:02Z) - Automated Recovery of Issue-Commit Links Leveraging Both Textual and
Non-textual Data [2.578242050187029]
Current state-of-the-art approaches for automated commit-issue linking suffer from low precision, leading to unreliable results.
We propose Hybrid-Linker to overcome such limitations by exploiting two information channels.
We evaluate Hybrid-Linker against competing approaches, namely FRLink and DeepLink on a dataset of 12 projects.
arXiv Detail & Related papers (2021-07-05T09:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.