On Multi-Modal Learning of Editing Source Code
- URL: http://arxiv.org/abs/2108.06645v1
- Date: Sun, 15 Aug 2021 02:06:49 GMT
- Title: On Multi-Modal Learning of Editing Source Code
- Authors: Saikat Chakraborty, Baishakhi Ray
- Abstract summary: In recent years, Neural Machine Translator (NMT) has shown promise in automatically editing source code.
In this research, we leverage three modalities of information: edit location, edit code context, commit messages (as a proxy of developers' hint in natural language) to automatically generate edits with NMT models.
We show that developers' hint as an input modality can narrow the search space for patches and outperform state-of-the-art models to generate correctly patched code in top-1 position.
- Score: 17.28158089963557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Neural Machine Translator (NMT) has shown promise in
automatically editing source code. Typical NMT based code editor only considers
the code that needs to be changed as input and suggests developers with a
ranked list of patched code to choose from - where the correct one may not
always be at the top of the list. While NMT based code editing systems generate
a broad spectrum of plausible patches, the correct one depends on the
developers' requirement and often on the context where the patch is applied.
Thus, if developers provide some hints, using natural language, or providing
patch context, NMT models can benefit from them. As a proof of concept, in this
research, we leverage three modalities of information: edit location, edit code
context, commit messages (as a proxy of developers' hint in natural language)
to automatically generate edits with NMT models. To that end, we build MODIT, a
multi-modal NMT based code editing engine. With in-depth investigation and
analysis, we show that developers' hint as an input modality can narrow the
search space for patches and outperform state-of-the-art models to generate
correctly patched code in top-1 position.
Related papers
- Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place.
Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z) - CodeEditorBench: Evaluating Code Editing Capability of Large Language Models [49.387195629660994]
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.
We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks.
We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks.
arXiv Detail & Related papers (2024-04-04T15:49:49Z) - Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions [6.367360745627828]
We introduce a benchmark of code editing tasks and use it to evaluate several cutting edge LLMs.
Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models.
We introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - InstructCoder: Instruction Tuning Large Language Models for Code Editing [26.160498475809266]
We explore the use of Large Language Models (LLMs) to edit code based on user instructions.
InstructCoder is the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing.
Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits.
arXiv Detail & Related papers (2023-10-31T10:15:35Z) - Automated Code Editing with Search-Generate-Modify [24.96672652375192]
This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification.
SARGAM is a novel tool designed to mimic a real developer's code editing behavior.
arXiv Detail & Related papers (2023-06-10T17:11:21Z) - On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document.
In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - CodeEditor: Learning to Edit Source Code with Pre-trained Models [47.736781998792]
This paper presents an effective pre-trained code editing model named CodeEditor.
We collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into mutated versions.
We conduct experiments on four code editing datasets and evaluate the pre-trained CodeEditor in three settings.
arXiv Detail & Related papers (2022-10-31T03:26:33Z) - InCoder: A Generative Model for Code Infilling and Synthesis [88.46061996766348]
We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) and editing (via infilling)
InCoder is trained to generate code files from a large corpus of permissively licensed code.
Our model is the first generative model that is able to directly perform zero-shot code infilling.
arXiv Detail & Related papers (2022-04-12T16:25:26Z) - A Sketch-Based Neural Model for Generating Commit Messages from Diffs [0.5239589676872304]
Commit messages have an important impact in software development, especially when working in large teams.
We apply neural machine translation (NMT) techniques to convert code diffs into commit messages.
We present an improved sketch-based encoder for this task.
arXiv Detail & Related papers (2021-04-08T21:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.