User-Centric Deployment of Automated Program Repair at Bloomberg
- URL: http://arxiv.org/abs/2311.10516v1
- Date: Fri, 17 Nov 2023 13:39:48 GMT
- Title: User-Centric Deployment of Automated Program Repair at Bloomberg
- Authors: David Williams, James Callan, Serkan Kirbas, Sergey Mechtaev, Justyna
Petke, Thomas Prideaux-Ghee, Federica Sarro
- Abstract summary: This paper presents a novel approach to optimally time, target, and present auto-generated patches to software engineers.
We use GitHub's Suggested Changes interface to seamlessly integrate automated suggestions into pull requests.
From our user study, B-Assist's efficacy is evident, with the acceptance rate of patch suggestions being as high as 74.56%.
- Score: 13.994851524965016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated program repair (APR) tools have unlocked the potential for the
rapid rectification of codebase issues. However, to encourage wider adoption of
program repair in practice, it is necessary to address the usability concerns
related to generating irrelevant or out-of-context patches. When software
engineers are presented with patches they deem uninteresting or unhelpful, they
are burdened with more "noise" in their workflows and become less likely to
engage with APR tools in future. This paper presents a novel approach to
optimally time, target, and present auto-generated patches to software
engineers. To achieve this, we designed, developed, and deployed a new tool
dubbed B-Assist, which leverages GitHub's Suggested Changes interface to
seamlessly integrate automated suggestions into active pull requests (PRs), as
opposed to creating new, potentially distracting PRs. This strategy ensures
that suggestions are not only timely, but also contextually relevant and
delivered to engineers most familiar with the affected code. Evaluation among
Bloomberg software engineers demonstrated their preference for this approach.
From our user study, B-Assist's efficacy is evident, with the acceptance rate
of patch suggestions being as high as 74.56%; engineers also found the
suggestions valuable, giving usefulness ratings of at least 4 out of 5 in 78.2%
of cases. Further, this paper sheds light on persisting usability challenges in
APR and lays the groundwork for enhancing the user experience in future APR
tools.
Related papers
- Enhancing Automated Program Repair with Solution Design [5.547148114448699]
We introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction.
Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly.
arXiv Detail & Related papers (2024-08-22T01:13:02Z) - SpecRover: Code Intent Extraction via LLMs [7.742980618437681]
specification inference can be useful for producing high quality program patches.
Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover.
In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover.
arXiv Detail & Related papers (2024-08-05T04:53:01Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Practical Program Repair via Preference-based Ensemble Strategy [28.176710503313895]
We propose a Preference-based Ensemble Program Repair framework (P-EPR) to rank APR tools for repairing different bugs.
P-EPR is the first non-learning-based APR ensemble method that is novel in its exploitation of repair patterns.
Experimental results show that P-EPR outperforms existing strategies significantly both in flexibility and effectiveness.
arXiv Detail & Related papers (2023-09-15T07:23:04Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - PatchZero: Zero-Shot Automatic Patch Correctness Assessment [13.19425284402493]
We propose toolname, the patch correctness assessment by adopting a large language model for code.
toolname prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools.
Our experimental results showed that toolname can achieve an accuracy of 84.4% and an F1-score of 86.5% on average.
arXiv Detail & Related papers (2023-03-01T03:12:11Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - Improving Automated Program Repair with Domain Adaptation [0.0]
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool.
APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2022-12-21T23:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.