Bug Fixing with Broader Context: Enhancing LLM-Based Program Repair via Layered Knowledge Injection
- URL: http://arxiv.org/abs/2506.24015v1
- Date: Mon, 30 Jun 2025 16:19:38 GMT
- Title: Bug Fixing with Broader Context: Enhancing LLM-Based Program Repair via Layered Knowledge Injection
- Authors: Ramtin Ehsani, Esteban Parra, Sonia Haiduc, Preetha Chatterjee,
- Abstract summary: In real-world projects, developers often rely on broader repository and project-level context beyond the local code to resolve such bugs.<n>We propose a layered knowledge injection framework that incrementally augments LLMs with structured context.<n>We evaluate this framework on a dataset of 314 bugs from BugsInPy, and analyze fix rates across six bug types.
- Score: 5.287304201523224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompting LLMs with bug-related context (e.g., error messages, stack traces) improves automated program repair, but many bugs still remain unresolved. In real-world projects, developers often rely on broader repository and project-level context beyond the local code to resolve such bugs. In this paper, we investigate how automatically extracting and providing such knowledge can improve LLM-based program repair. We propose a layered knowledge injection framework that incrementally augments LLMs with structured context. It starts with the Bug Knowledge Layer, which includes information such as the buggy function and failing tests; expands to the Repository Knowledge Layer, which adds structural dependencies, related files, and commit history; and finally injects the Project Knowledge Layer, which incorporates relevant details from documentation and previously fixed bugs. We evaluate this framework on a dataset of 314 bugs from BugsInPy using two LLMs (Llama 3.3 and GPT-4o-mini), and analyze fix rates across six bug types. By progressively injecting knowledge across layers, our approach achieves a fix rate of 79% (250/314) using Llama 3.3, a significant improvement of 23% over previous work. All bug types show improvement with the addition of repository-level context, while only a subset benefit further from project-level knowledge, highlighting that different bug types require different levels of contextual information for effective repair. We also analyze the remaining unresolved bugs and find that more complex and structurally isolated bugs, such as Program Anomaly and GUI bugs, remain difficult even after injecting all available information. Our results show that layered context injection improves program repair and suggest the need for interactive and adaptive APR systems.
Related papers
- An Empirical Study on the Capability of LLMs in Decomposing Bug Reports [9.544728752295269]
This study investigates whether large language models (LLMs) can assist developers in automatically decomposing complex bug reports into smaller, self-contained units.<n>We conducted an empirical study on 127 resolved privacy-related bug reports collected from Apache Jira.
arXiv Detail & Related papers (2025-04-29T16:29:12Z) - Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs [8.467850621024672]
Repository-level software repair faces challenges in bridging semantic gaps between issue descriptions and code patches.<n>Existing approaches, which mostly depend on large language models (LLMs), suffer from semantic ambiguities, limited structural context understanding, and insufficient reasoning capability.<n>We propose a novel repository-aware knowledge graph (KG) that accurately links repository artifacts (issues and pull requests) and entities.
arXiv Detail & Related papers (2025-03-27T17:21:47Z) - PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing [34.768989900184636]
Bug fixing holds significant importance in software development and maintenance.<n>Recent research has made substantial strides in exploring the potential of large language models (LLMs) for automatically resolving software bugs.
arXiv Detail & Related papers (2025-01-27T15:43:04Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - A Unified Debugging Approach via LLM-Based Multi-Agent Synergy [39.11825182386288]
FixAgent is an end-to-end framework for unified debug through multi-agent synergy.
It significantly outperforms state-of-the-art repair methods, fixing 1.25$times$ to 2.56$times$ bugs on the repo-level benchmark, Defects4J.
arXiv Detail & Related papers (2024-04-26T04:55:35Z) - When Large Language Models Confront Repository-Level Automatic Program
Repair: How Well They Done? [13.693311241492827]
We introduce RepoBugs, a new benchmark comprising 124 typical repository-level bugs from open-source repositories.
Preliminary experiments using GPT3.5 based on the function where the error is located, reveal that the repair rate on RepoBugs is only 22.58%.
We propose a simple and universal repository-level context extraction method (RLCE) designed to provide more precise context for repository-level code repair tasks.
arXiv Detail & Related papers (2024-03-01T11:07:41Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - Retrieval-augmented Multilingual Knowledge Editing [81.6690436581947]
Knowledge represented in Large Language Models (LLMs) is quite often incorrect and can also become obsolete over time.
Knowledge editing (KE) has developed as an effective and economical alternative to inject new knowledge.
We propose Retrieval-augmented Multilingual Knowledge Editor (ReMaKE) to update new knowledge in LLMs.
arXiv Detail & Related papers (2023-12-20T14:08:58Z) - On Using GUI Interaction Data to Improve Text Retrieval-based Bug
Localization [10.717184444794505]
We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, can improve upon existing techniques for bug localization.
We source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports.
arXiv Detail & Related papers (2023-10-12T07:14:22Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.