Related papers: Enhancing Automated Program Repair with Solution Design

Enhancing Automated Program Repair with Solution Design

URL: http://arxiv.org/abs/2408.12056v2
Date: Sun, 22 Sep 2024 03:16:38 GMT
Title: Enhancing Automated Program Repair with Solution Design
Authors: Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang, Fang Liu,
Abstract summary: We introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly.
Score: 5.547148114448699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4's constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR.

Related papers

DeepSeek-V3, GPT-4, Phi-4, and LLaMA-3.3 generate correct code for LoRaWAN-related engineering tasks [0.8301471481260676]
This paper investigates the performance of 16 Large Language Models (LLMs) in automating LoRaWAN-related engineering tasks. To assess this, we compared locally run models against state-of-the-art alternatives, such as GPT-4 and DeepSeek-V3. Results show that while DeepSeek-V3 and GPT-4 consistently provided accurate solutions, certain smaller models -- particularly Phi-4 and LLaMA-3.3 -- also demonstrated strong performance.
arXiv Detail & Related papers (2025-02-19T23:16:29Z)
Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions. Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
AIC CTU system at AVeriTeC: Re-framing automated fact-checking as a simple RAG task [0.0]
This paper describes our solution to the challenge of fact-checking with evidence retrieved in the wild using a simple scheme of Retrieval-Augmented Generation (RAG) We release our and explain its two modules - the Retriever and the Evidence & Label generator - in detail, justifying their features such as MMR-reranking and Likert-scale confidence estimation. We perform an empirical error analysis to see that faults in our predictions often coincide with noise in the data or ambiguous fact-checks, provoking further research and data augmentation.
arXiv Detail & Related papers (2024-10-15T09:50:19Z)
SpecRover: Code Intent Extraction via LLMs [7.742980618437681]
specification inference can be useful for producing high quality program patches. Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover. In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover.
arXiv Detail & Related papers (2024-08-05T04:53:01Z)
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting [68.90949377014742]
Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.
arXiv Detail & Related papers (2024-07-11T06:50:19Z)
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds [51.05639500325598]
QuadrupedGPT is a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. Our agent processes human command and environmental contexts using a large multimodal model (LMM) It is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals.
arXiv Detail & Related papers (2024-06-24T12:14:24Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. This paper introduces an innovative APR approach called GIANTREPAIR. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z)
A Novel Approach for Automated Design Information Mining from Issue Logs [3.5665328754813768]
DRMiner is a novel method to automatically mine latent design rationales from developers' live discussion in open-source community. We acquire issue logs from Cassandra, Flink, and Solr repositories in Jira, and then annotate and process them under a rigorous scheme. DRMiner achieves an F1 score of 65% for mining design rationales, outperforming all baselines with a 7% improvement over GPT-4.0.
arXiv Detail & Related papers (2024-05-30T02:20:04Z)
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs [54.054334823194615]
We consider Reverse Image Retrieval (RIR) augmented generation, a simple yet effective strategy to augment MLLMs with web-scale reverse image search results. RIR robustly improves knowledge-intensive visual question answering (VQA) of GPT-4V by 37-43%, GPT-4 Turbo by 25-27%, and GPT-4o by 18-20% in terms of open-ended VQA evaluation metrics.
arXiv Detail & Related papers (2024-05-29T04:00:41Z)
User-Centric Deployment of Automated Program Repair at Bloomberg [13.994851524965016]
This paper presents a novel approach to optimally time, target, and present auto-generated patches to software engineers. We use GitHub's Suggested Changes interface to seamlessly integrate automated suggestions into pull requests. From our user study, B-Assist's efficacy is evident, with the acceptance rate of patch suggestions being as high as 74.56%.
arXiv Detail & Related papers (2023-11-17T13:39:48Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Generalized Planning in PDDL Domains with Pretrained Large Language Models [82.24479434984426]
We consider PDDL domains and use GPT-4 to synthesize Python programs. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines.
arXiv Detail & Related papers (2023-05-18T14:48:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.