Automated Code Editing with Search-Generate-Modify
- URL: http://arxiv.org/abs/2306.06490v2
- Date: Mon, 26 Feb 2024 16:03:24 GMT
- Title: Automated Code Editing with Search-Generate-Modify
- Authors: Changshu Liu, Pelin Cetin, Yogesh Patodia, Saikat Chakraborty,
Yangruibo Ding, Baishakhi Ray
- Abstract summary: This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification.
SARGAM is a novel tool designed to mimic a real developer's code editing behavior.
- Score: 24.96672652375192
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Code editing is essential in evolving software development. Many automated
code editing tools have been proposed that leverage both Information
Retrieval-based techniques and Machine Learning-based code generation and code
editing models. Each technique comes with its own promises and perils, and they
are often used together to complement their strengths and compensate for their
weaknesses. This paper proposes a hybrid approach to better synthesize code
edits by leveraging the power of code search, generation, and modification. Our
key observation is that a patch obtained by search and retrieval, even if
imperfect, can provide helpful guidance to a code generation model. However, a
retrieval-guided patch produced by a code generation model can still be a few
tokens off from the intended patch. Such generated patches can be slightly
modified to create the intended patches. SARGAM is a novel tool designed to
mimic a real developer's code editing behavior. Given an original code version,
the developer may search for related patches, generate or write the code, and
then modify the generated code to adapt it to the right context. Our evaluation
of SARGAM on edit generation shows superior performance with respect to current
state-of-the-art techniques. SARGAM also shows great effectiveness on automated
program repair tasks.
Related papers
- No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair [9.562123938545522]
toolname can integrate various code search, generation, and repair tools, combining these three research areas together for the first time.
We conduct preliminary experiments to demonstrate the potential of our framework, eg helping CodeLlama solve 267 programming problems with an improvement of 62.53%.
arXiv Detail & Related papers (2024-09-05T06:24:29Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - GrACE: Generation using Associated Code Edits [23.643567386291988]
We endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits.
The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits.
We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively.
arXiv Detail & Related papers (2023-05-23T14:55:44Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - CCRep: Learning Code Change Representations via Pre-Trained Code Model
and Query Back [8.721077261941236]
This work proposes a novel Code Change Representation learning approach named CCRep.
CCRep learns to encode code changes as feature vectors for diverse downstream tasks.
We apply CCRep to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction.
arXiv Detail & Related papers (2023-02-08T07:43:55Z) - Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology.
The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z) - ReCode: Robustness Evaluation of Code Generation Models [90.10436771217243]
We propose ReCode, a comprehensive robustness evaluation benchmark for code generation models.
We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format.
With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt.
arXiv Detail & Related papers (2022-12-20T14:11:31Z) - InCoder: A Generative Model for Code Infilling and Synthesis [88.46061996766348]
We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) and editing (via infilling)
InCoder is trained to generate code files from a large corpus of permissively licensed code.
Our model is the first generative model that is able to directly perform zero-shot code infilling.
arXiv Detail & Related papers (2022-04-12T16:25:26Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Unsupervised Learning of General-Purpose Embeddings for Code Changes [6.652641137999891]
We propose an approach for obtaining embeddings of code changes during pre-training.
We evaluate them on two different downstream tasks - applying changes to code and commit message generation.
Our model outperforms the model that uses full edit sequences by 5.9 percentage points in accuracy.
arXiv Detail & Related papers (2021-06-03T19:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.