Related papers: User-Centric Deployment of Automated Program Repair at Bloomberg

User-Centric Deployment of Automated Program Repair at Bloomberg

URL: http://arxiv.org/abs/2311.10516v1
Date: Fri, 17 Nov 2023 13:39:48 GMT
Title: User-Centric Deployment of Automated Program Repair at Bloomberg
Authors: David Williams, James Callan, Serkan Kirbas, Sergey Mechtaev, Justyna Petke, Thomas Prideaux-Ghee, Federica Sarro
Abstract summary: This paper presents a novel approach to optimally time, target, and present auto-generated patches to software engineers. We use GitHub's Suggested Changes interface to seamlessly integrate automated suggestions into pull requests. From our user study, B-Assist's efficacy is evident, with the acceptance rate of patch suggestions being as high as 74.56%.
Score: 13.994851524965016
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated program repair (APR) tools have unlocked the potential for the rapid rectification of codebase issues. However, to encourage wider adoption of program repair in practice, it is necessary to address the usability concerns related to generating irrelevant or out-of-context patches. When software engineers are presented with patches they deem uninteresting or unhelpful, they are burdened with more "noise" in their workflows and become less likely to engage with APR tools in future. This paper presents a novel approach to optimally time, target, and present auto-generated patches to software engineers. To achieve this, we designed, developed, and deployed a new tool dubbed B-Assist, which leverages GitHub's Suggested Changes interface to seamlessly integrate automated suggestions into active pull requests (PRs), as opposed to creating new, potentially distracting PRs. This strategy ensures that suggestions are not only timely, but also contextually relevant and delivered to engineers most familiar with the affected code. Evaluation among Bloomberg software engineers demonstrated their preference for this approach. From our user study, B-Assist's efficacy is evident, with the acceptance rate of patch suggestions being as high as 74.56%; engineers also found the suggestions valuable, giving usefulness ratings of at least 4 out of 5 in 78.2% of cases. Further, this paper sheds light on persisting usability challenges in APR and lays the groundwork for enhancing the user experience in future APR tools.

Related papers

Enhancing Automated Program Repair with Solution Design [5.547148114448699]
We introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly.
arXiv Detail & Related papers (2024-08-22T01:13:02Z)
SpecRover: Code Intent Extraction via LLMs [7.742980618437681]
specification inference can be useful for producing high quality program patches. Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover. In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover.
arXiv Detail & Related papers (2024-08-05T04:53:01Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Practical Program Repair via Preference-based Ensemble Strategy [28.176710503313895]
We propose a Preference-based Ensemble Program Repair framework (P-EPR) to rank APR tools for repairing different bugs. P-EPR is the first non-learning-based APR ensemble method that is novel in its exploitation of repair patterns. Experimental results show that P-EPR outperforms existing strategies significantly both in flexibility and effectiveness.
arXiv Detail & Related papers (2023-09-15T07:23:04Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
PatchZero: Zero-Shot Automatic Patch Correctness Assessment [13.19425284402493]
We propose toolname, the patch correctness assessment by adopting a large language model for code. toolname prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools. Our experimental results showed that toolname can achieve an accuracy of 84.4% and an F1-score of 86.5% on average.
arXiv Detail & Related papers (2023-03-01T03:12:11Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Improving Automated Program Repair with Domain Adaptation [0.0]
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2022-12-21T23:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.