Related papers: Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

URL: http://arxiv.org/abs/2412.03905v1
Date: Thu, 05 Dec 2024 06:21:31 GMT
Title: Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair
Authors: Qiong Feng, Xiaotian Ma, Jiayi Sheng, Ziyuan Feng, Wei Song, Peng Liang,
Abstract summary: We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods.<n>By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods.<n>This outperforms current state-of-the-art APR methods.
Score: 2.9176578730256733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code or directly generate patches when provided with buggy methods. However, most of LLM-based APR methods rely on a single type of software information, without fully leveraging different software artifacts. Despite this, many LLM-based approaches do not explore which specific types of information best assist in APR. Addressing this gap is crucial for advancing LLM-based APR techniques. We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate plausible patches which can pass all unit tests. The results show that while issue content is particularly effective in assisting LLMs with fault localization and program repair, different types of software artifacts complement each other. By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods and generates 56.0% and 14.5% plausible patches for the Defects4J v2.0 dataset, respectively. This outperforms current state-of-the-art APR methods. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe.

Related papers

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits [72.23150343093447]
We introduce EDIT-Bench, a benchmark for evaluatingstructed code editing capabilities grounded in real-world usage.<n>EDIT-Bench comprises of 545 problems, multiple natural and programming languages, and a diverse set of real-world use cases.<n>We find that model performance varies across different categories of user instructions.
arXiv Detail & Related papers (2025-11-06T16:05:28Z)
InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
Where's the Bug? Attention Probing for Scalable Fault Localization [18.699014321422023]
We present Bug Attention Probe (BAP), a method which learns state-of-the-art fault localization without any direct localization labels.<n>BAP is significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost.
arXiv Detail & Related papers (2025-02-19T18:59:32Z)
PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing [34.768989900184636]
Bug fixing holds significant importance in software development and maintenance.<n>Recent research has made substantial strides in exploring the potential of large language models (LLMs) for automatically resolving software bugs.
arXiv Detail & Related papers (2025-01-27T15:43:04Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks. We show that even the most prominent LLMs exhibit these error patterns in their outputs. Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z)
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection [8.22737389683156]
Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning. We introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy. Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. This paper introduces an innovative APR approach called GIANTREPAIR. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z)
MEIC: Re-thinking RTL Debug Automation using LLMs [18.964523115622928]
This work introduces a novel framework, Make Each Iteration Count (MEIC) MEIC is suitable for identifying and correcting both syntax and function errors. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors.
arXiv Detail & Related papers (2024-05-10T22:32:39Z)
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [62.02920842630234]
We show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors. For evaluation, we unify datasets from recent work on fact-checking and grounding LLM generations into a new benchmark, LLM-AggreFact.
arXiv Detail & Related papers (2024-04-16T17:59:10Z)
Aligning LLMs for FL-free Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective. Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Leveraging Print Debugging to Improve Code Generation in Large Language Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks. But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair [22.342625625700908]
"ConDefects" is a novel dataset of real faults meticulously curated to eliminate such overlap. "ConDefects" contains 1,254 Java faulty programs and 1,625 Python faulty programs. We pair each fault with fault locations and the corresponding repaired code versions, making it tailored for fault localization and program repair related research.
arXiv Detail & Related papers (2023-10-25T00:06:02Z)
Compressing LLMs: The Truth is Rarely Pure and Never Simple [90.05366363633568]
Knowledge-Intensive Compressed LLM BenchmarK aims to redefine the evaluation protocol for compressed Large Language Models. LLM-KICK unveils many favorable merits and unfortunate plights of current SoTA compression methods. LLM-KICK is designed to holistically access compressed LLMs' ability for language understanding, reasoning, generation, in-context retrieval, in-context summarization, etc.
arXiv Detail & Related papers (2023-10-02T17:42:37Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
Practical Program Repair in the Era of Large Pre-trained Language Models [13.694803023685175]
Automated Program Repair (APR) aims to help developers automatically patch software bugs. PLMs, trained using billions of text/code tokens, can potentially help avoid this issue. We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size.
arXiv Detail & Related papers (2022-10-25T17:18:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.