Enhancing Genetic Improvement Mutations Using Large Language Models
- URL: http://arxiv.org/abs/2310.19813v1
- Date: Wed, 18 Oct 2023 10:24:14 GMT
- Title: Enhancing Genetic Improvement Mutations Using Large Language Models
- Authors: Alexander E.I. Brownlee, James Callan, Karine Even-Mendoza, Alina
Geiger, Carol Hanna, Justyna Petke, Federica Sarro, Dominik Sobania
- Abstract summary: Large language models (LLMs) have been successfully applied to software engineering tasks, including program repair.
We evaluate the use of LLMs as mutation operators for Genetic Improvement (GI) to improve the search process.
We find that the number of patches passing unit tests is up to 75% higher with LLM-based edits than with standard Insert edits.
- Score: 47.62003403631452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have been successfully applied to software
engineering tasks, including program repair. However, their application in
search-based techniques such as Genetic Improvement (GI) is still largely
unexplored. In this paper, we evaluate the use of LLMs as mutation operators
for GI to improve the search process. We expand the Gin Java GI toolkit to call
OpenAI's API to generate edits for the JCodec tool. We randomly sample the
space of edits using 5 different edit types. We find that the number of patches
passing unit tests is up to 75% higher with LLM-based edits than with standard
Insert edits. Further, we observe that the patches found with LLMs are
generally less diverse compared to standard edits. We ran GI with local search
to find runtime improvements. Although many improving patches are found by
LLM-enhanced GI, the best improving patch was found by standard GI.
Related papers
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models [65.93240009586351]
Large language models (LLMs) often exhibit hallucinations due to incorrect or outdated knowledge.
We introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters.
We theoretically prove that this projection ensures the output of post-edited LLMs remains unchanged when queried about the preserved knowledge.
arXiv Detail & Related papers (2024-10-03T10:06:27Z) - Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques.
Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z) - CodeEditorBench: Evaluating Code Editing Capability of Large Language Models [49.387195629660994]
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.
We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks.
We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks.
arXiv Detail & Related papers (2024-04-04T15:49:49Z) - Knowledge Graph Enhanced Large Language Model Editing [37.6721061644483]
Large language models (LLMs) are pivotal in advancing natural language processing (NLP) tasks.
Existing editing methods struggle to track and incorporate changes in knowledge associated with edits.
We propose a novel model editing method that leverages knowledge graphs for enhancing LLM editing, namely GLAME.
arXiv Detail & Related papers (2024-02-21T07:52:26Z) - Improving Cross-Domain Low-Resource Text Generation through LLM
Post-Editing: A Programmer-Interpreter Approach [50.400999859808984]
Post-editing has proven effective in improving the quality of text generated by large language models (LLMs)
We propose a neural programmer-interpreter approach that preserves the domain generalization ability of LLMs when editing their output.
Experiments demonstrate that the programmer-interpreter significantly enhances GPT-3.5's performance in logical form-to-text conversion and low-resource machine translation.
arXiv Detail & Related papers (2024-02-07T06:13:14Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models (LLMs)
We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.
We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the textscCounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - InstructCoder: Instruction Tuning Large Language Models for Code Editing [26.160498475809266]
We explore the use of Large Language Models (LLMs) to edit code based on user instructions.
InstructCoder is the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing.
Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits.
arXiv Detail & Related papers (2023-10-31T10:15:35Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Towards Generating Functionally Correct Code Edits from Natural Language
Issue Descriptions [11.327913840111378]
We introduce Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J dataset augmented with high-level descriptions of bug fixes.
We empirically evaluate the performance of several state-of-the-art LLMs for the this task.
arXiv Detail & Related papers (2023-04-07T18:58:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.