Related papers: RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models

RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models

URL: http://arxiv.org/abs/2601.18044v1
Date: Sun, 25 Jan 2026 23:41:42 GMT
Title: RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models
Authors: Melika Sepidband, Hamed Taherkhani, Hung Viet Pham, Hadi Hemmati,
Abstract summary: We present a novel project-level FL approach that improves both file- and element-level localization.<n>We evaluate our approach on Python and Java projects from SWE-bench Verified, Lite, and Java.
Score: 1.9196411948992402
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fault Localization (FL) is a critical step in Automated Program Repair (APR), and its importance has increased with the rise of Large Language Model (LLM)-based repair agents. In realistic project-level repair scenarios, software repositories often span millions of tokens, far exceeding current LLM context limits. Consequently, models must first identify a small, relevant subset of code, making accurate FL essential for effective repair. We present a novel project-level FL approach that improves both file- and element-level localization. Our method introduces a hierarchical reasoning module that (i) generates structured, bug-specific explanations for candidate files and elements, and (ii) leverages these explanations in a two-stage ranking scheme combining LLM-based and embedding-based signals. We further propose a counterfactual upper-bound analysis to quantify the contribution of each localization stage to repair success. We evaluate our approach on Python and Java projects from SWE-bench Verified, Lite, and Java. Compared to state-of-the-art baselines, including Agentless and OpenHands, our method consistently improves localization accuracy. On SWE-bench Verified, file-level Hit@1 improves from 71.4% to 85%, and MRR from 81.8% to 88.8%. At the element level, Exact Match under top-3 files increases from 36% to 69%. Integrating our localization into Agentless yields a 12.8% end-to-end repair success improvement.

Related papers

DiffuRank: Effective Document Reranking with Diffusion Language Models [71.16830004674513]
We propose DiffuRank, a reranking framework built upon diffusion language models (dLLMs)<n>dLLMs support more flexible decoding and generation processes that are not constrained to a left-to-right order.<n>We show dLLMs achieve performance comparable to, and in some cases exceeding, that of autoregressive LLMs with similar model sizes.
arXiv Detail & Related papers (2026-02-13T02:18:14Z)
Structured Uncertainty guided Clarification for LLM Agents [126.26213027785813]
LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures.<n>We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy.<n>Our SAGE-Agent leverages this structured uncertainty to achieve superior efficiency: increasing coverage on ambiguous tasks by 7-39% while reducing clarification questions by 1.5-2.7$times$ compared to strong prompting and uncertainty-based baselines.
arXiv Detail & Related papers (2025-11-11T21:50:44Z)
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation [3.9189409002585567]
Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks.<n>We introduce a benchmark derived from real-world open-source repositories to evaluate generalization under practical conditions.<n>We examine how input specification completeness and retrieval-augmented generation affect class-level correctness across multiple state-of-the-art LLMs.
arXiv Detail & Related papers (2025-10-30T04:30:23Z)
REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement [12.995571513415905]
Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR)<n>LLMs often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites.<n>We propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones.
arXiv Detail & Related papers (2025-10-04T00:34:32Z)
Enhancing LLM-based Fault Localization with a Functionality-Aware Retrieval-Augmented Generation Framework [14.287359838639608]
FaR-Loc is a framework that enhances method-level fault localization.<n> FaR-Loc consists of three key components: LLM Functionality Extraction, Semantic Retrieval, and LLM Re-ranking.<n>Our experiments on the widely used Defects4J benchmark show that FaR-Loc outperforms state-of-the-art LLM-based baselines.
arXiv Detail & Related papers (2025-09-24T20:37:11Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
Where's the Bug? Attention Probing for Scalable Fault Localization [18.699014321422023]
We present Bug Attention Probe (BAP), a method which learns state-of-the-art fault localization without any direct localization labels.<n>BAP is significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost.
arXiv Detail & Related papers (2025-02-19T18:59:32Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
Large Language Models for Test-Free Fault Localization [11.080712737595174]
We propose a language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune language models with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%.
arXiv Detail & Related papers (2023-10-03T01:26:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.