CodeEditor: Learning to Edit Source Code with Pre-trained Models
- URL: http://arxiv.org/abs/2210.17040v3
- Date: Thu, 7 Sep 2023 11:35:51 GMT
- Title: CodeEditor: Learning to Edit Source Code with Pre-trained Models
- Authors: Jia Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, Zhiyi Fu
- Abstract summary: This paper presents an effective pre-trained code editing model named CodeEditor.
We collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into mutated versions.
We conduct experiments on four code editing datasets and evaluate the pre-trained CodeEditor in three settings.
- Score: 47.736781998792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developers often perform repetitive code editing activities for various
reasons (e.g., code refactoring) during software development. Pre-trained code
editing models have achieved the state-of-the-art (SOTA) results. Pre-trained
models are first pre-trained with pre-training tasks and fine-tuned with the
code editing task. Existing pre-training tasks mainly are code infilling tasks
(e.g., masked language modeling), which are derived from the natural language
processing field and are not designed for automatic code editing.
This paper proposes a novel pre-training task specialized in code editing and
presents an effective pre-trained code editing model named CodeEditor. Our
pre-training task further improves the performance and generalization ability
of code editing models. Specifically, we collect lots of real-world code
snippets as the ground truth and use a powerful generator to rewrite them into
mutated versions. Then, we pre-train our CodeEditor to edit mutated versions
into the corresponding ground truth, to learn edit patterns. We conduct
experiments on four code editing datasets and evaluate the pre-trained
CodeEditor in three settings. (1) In the fine-tuning setting, we train the
pre-trained CodeEditor with four datasets and evaluate it on the test data.
CodeEditor outperforms the SOTA baselines by 15%, 25.5%, and 9.4% and 26.6% on
four datasets. (2) In the few-shot setting, we train the pre-trained CodeEditor
with limited data and evaluate it on the test data. CodeEditor substantially
performs better than all baselines. (3) In the zero-shot setting, CodeEditor
correctly edits 1,113 programs while the SOTA baselines can not work.
Related papers
- CodeEditorBench: Evaluating Code Editing Capability of Large Language Models [49.387195629660994]
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.
We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks.
We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks.
arXiv Detail & Related papers (2024-04-04T15:49:49Z) - InstructCoder: Instruction Tuning Large Language Models for Code Editing [26.160498475809266]
We explore the use of Large Language Models (LLMs) to edit code based on user instructions.
InstructCoder is the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing.
Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits.
arXiv Detail & Related papers (2023-10-31T10:15:35Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - GrACE: Generation using Associated Code Edits [23.643567386291988]
We endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits.
The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits.
We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively.
arXiv Detail & Related papers (2023-05-23T14:55:44Z) - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value
Adaptors [53.819805242367345]
We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model.
GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights.
Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs.
arXiv Detail & Related papers (2022-11-20T17:18:22Z) - CoditT5: Pretraining for Source Code and Natural Language Editing [34.77621217370665]
CoditT5 is a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments.
We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review.
arXiv Detail & Related papers (2022-08-10T16:59:40Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z) - Unsupervised Learning of General-Purpose Embeddings for Code Changes [6.652641137999891]
We propose an approach for obtaining embeddings of code changes during pre-training.
We evaluate them on two different downstream tasks - applying changes to code and commit message generation.
Our model outperforms the model that uses full edit sequences by 5.9 percentage points in accuracy.
arXiv Detail & Related papers (2021-06-03T19:08:53Z) - Learning Structural Edits via Incremental Tree Transformations [102.64394890816178]
We present a generic model for incremental editing of structured data (i.e., "structural edits")
Our editor learns to iteratively generate tree edits (e.g., deleting or adding a subtree) and applies them to the partially edited data.
We evaluate our proposed editor on two source code edit datasets, where results show that, with the proposed edit encoder, our editor significantly improves accuracy over previous approaches.
arXiv Detail & Related papers (2021-01-28T16:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.