REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement
- URL: http://arxiv.org/abs/2510.03588v1
- Date: Sat, 04 Oct 2025 00:34:32 GMT
- Title: REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement
- Authors: Anvith Pabba, Simin Chen, Alex Mathai, Anindya Chakraborty, Baishakhi Ray,
- Abstract summary: Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR)<n>LLMs often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites.<n>We propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones.
- Score: 12.995571513415905
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR), especially in repository-level settings where the goal is to generate patches based on natural language issue descriptions, large codebases, and regression tests. However, despite their promise, current LLM-based APR techniques often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites. As a result, they frequently generate Draft Patches-partially correct patches that either incompletely address the bug or overfit to the test cases. In this work, we propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones. Refine addresses three key challenges: disambiguating vague issue and code context, diversifying patch candidates through test-time scaling, and aggregating partial fixes via an LLM-powered code review process. We implement Refine as a general refinement module that can be integrated into both open-agent-based and workflow-based APR systems. Our evaluation on the SWE-Bench Lite benchmark shows that Refine achieves state-of-the-art results among workflow-based approaches and approaches the best-known performance across all APR categories. Specifically, Refine boosts AutoCodeRover's performance by 14.67%, achieving a score of 51.67% and surpassing all prior baselines. On SWE-Bench Verified, Refine improves the resolution rate by 12.2%, and when integrated across multiple APR systems, it yields an average improvement of 14%-demonstrating its broad effectiveness and generalizability. These results highlight the effectiveness of refinement as a missing component in current APR pipelines and the potential of agentic collaboration in closing the gap between near-correct and correct patches. We also open source our code.
Related papers
- Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment [0.0]
We introduce RePaCA, a novel static APCA technique that leverages Large Language Models (LLMs) specialized in thinking tasks.<n>Our approach achieves state-of-the-art performance, with 83.1% accuracy and an 84.8% F1-score.
arXiv Detail & Related papers (2025-07-30T11:21:09Z) - SemAgent: A Semantics Aware Program Repair Agent [14.80363334219173]
SemAgent is a novel workflow-based procedure that leverages issue, code, and execution semantics to generate patches that are complete.<n>We achieve this through a novel pipeline that (a) leverages execution semantics to retrieve relevant context, (b) comprehends issue-semantics via generalized abstraction, and (c) isolates code-semantics within the context of this abstraction.<n>Our evaluations show that our methodology achieves a solve rate of 44.66% on the SWEBench-Lite benchmark beating all other workflow-based approaches, and an absolute improvement of 7.66% compared to our baseline.
arXiv Detail & Related papers (2025-06-19T23:27:58Z) - The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models [48.073219761367184]
We investigate an APR pipeline that balances the generation of multiple outputs and multiple rounds of iteration.<n>We fine-tune each model on an APR dataset with three sizes (1K, 30K, 65K) and two techniques (Full Fine-Tuning and LoRA)<n>Our results show that by using only a fraction (1%) of the fine-tuning dataset, we can achieve improvements of up to 78% in the number of plausible patches generated.
arXiv Detail & Related papers (2025-05-05T18:06:51Z) - Teaching Your Models to Understand Code via Focal Preference Alignment [50.5876666815306]
In existing approaches, a set of n candidate solutions is evaluated based on test case success rates.<n>Because this approach aligns entire failing code blocks rather than pinpointing specific errors, it lacks the granularity necessary to capture meaningful error-correction relationships.<n>We propose Target-DPO, a new preference alignment framework that mimics human iterative debug to refine Code LLMs.
arXiv Detail & Related papers (2025-03-04T16:56:34Z) - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers.
This paper introduces an innovative APR approach called GIANTREPAIR.
Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Boosting Redundancy-based Automated Program Repair by Fine-grained Pattern Mining [18.7107522872479]
We propose a new repair technique named Repatt, which incorporates a two-level pattern mining process for guiding effective patch generation.<n>We have conducted an experiment on the widely-used Defects4J benchmark and compared Repatt with ten state-of-the-art APR approaches.
arXiv Detail & Related papers (2023-12-26T08:42:32Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.