From Issues to Insights: RAG-based Explanation Generation from Software Engineering Artifacts
- URL: http://arxiv.org/abs/2601.05721v1
- Date: Fri, 09 Jan 2026 11:05:50 GMT
- Title: From Issues to Insights: RAG-based Explanation Generation from Software Engineering Artifacts
- Authors: Daniel Pöttgen, Mersedeh Sadeghi, Max Unterbusch, Andreas Vogelsang,
- Abstract summary: We are the first to apply a Retrieval-Augmented Generation (RAG) approach for generating explanations from issue-tracking data.<n>Our proof-of-concept system is implemented using open-source tools and language models, demonstrating the feasibility of leveraging structured issue data for explanation generation.
- Score: 1.18094111609063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing complexity of modern software systems has made understanding their behavior increasingly challenging, driving the need for explainability to improve transparency and user trust. Traditional documentation is often outdated or incomplete, making it difficult to derive accurate, context-specific explanations. Meanwhile, issue-tracking systems capture rich and continuously updated development knowledge, but their potential for explainability remains untapped. With this work, we are the first to apply a Retrieval-Augmented Generation (RAG) approach for generating explanations from issue-tracking data. Our proof-of-concept system is implemented using open-source tools and language models, demonstrating the feasibility of leveraging structured issue data for explanation generation. Evaluating our approach on an exemplary project's set of GitHub issues, we achieve 90% alignment with human-written explanations. Additionally, our system exhibits strong faithfulness and instruction adherence, ensuring reliable and grounded explanations. These findings suggest that RAG-based methods can extend explainability beyond black-box ML models to a broader range of software systems, provided that issue-tracking data is available - making system behavior more accessible and interpretable.
Related papers
- ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z) - Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward [33.56471468540189]
We introduce UniSandbox, a decoupled evaluation framework paired with controlled, synthetic datasets to avoid data leakage.<n>Our findings reveal a significant understanding-generation gap, which is mainly reflected in two key dimensions: reasoning generation and knowledge transfer.<n>UniSandbox provides preliminary insights for designing future unified architectures and training strategies that truly bridge the gap between understanding and generation.
arXiv Detail & Related papers (2025-11-25T17:58:48Z) - Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - PropMEND: Hypernetworks for Knowledge Propagation in LLMs [82.99849359892112]
We present a hypernetwork-based approach for knowledge propagation, named PropMEND.<n>We show almost 2x accuracy on challenging multi-hop questions whose answers are not explicitly stated in the injected fact.<n>We also introduce a new dataset, Controlled RippleEdit, to evaluate the generalization of our hypernetwork.
arXiv Detail & Related papers (2025-06-10T15:44:19Z) - G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation [48.23263809469786]
We propose a framework using graph retrieval-augmented large language models (LLMs) for explainable recommendation.<n>G-Refer achieves superior performance compared with existing methods in both explainability and stability.
arXiv Detail & Related papers (2025-02-18T06:42:38Z) - Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report [3.4632900249241874]
This paper presents an experience report on the development of Retrieval Augmented Generation (RAG) systems using PDF documents as the primary data source.
The RAG architecture combines generative capabilities of Large Language Models (LLMs) with the precision of information retrieval.
The practical implications of this research lie in enhancing the reliability of generative AI systems in various sectors.
arXiv Detail & Related papers (2024-10-21T12:21:49Z) - Retrieval-Augmented Generation for Large Language Models: A Survey [17.82361213043507]
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination.
Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases.
arXiv Detail & Related papers (2023-12-18T07:47:33Z) - IRJIT: A Simple, Online, Information Retrieval Approach for Just-In-Time Software Defect Prediction [10.084626547964389]
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time.
Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models.
We propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits.
arXiv Detail & Related papers (2022-10-05T17:54:53Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.