Neural Machine Translation for Code Generation
- URL: http://arxiv.org/abs/2305.13504v1
- Date: Mon, 22 May 2023 21:43:12 GMT
- Title: Neural Machine Translation for Code Generation
- Authors: Dharma KC, Clayton T. Morrison
- Abstract summary: In NMT for code generation, the task is to generate source code that satisfies constraints expressed in the input.
In this paper we survey the NMT for code generation literature, cataloging the variety of methods that have been explored.
We discuss the limitations of existing methods and future research directions.
- Score: 0.7607163273993514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural machine translation (NMT) methods developed for natural language
processing have been shown to be highly successful in automating translation
from one natural language to another. Recently, these NMT methods have been
adapted to the generation of program code. In NMT for code generation, the task
is to generate output source code that satisfies constraints expressed in the
input. In the literature, a variety of different input scenarios have been
explored, including generating code based on natural language description,
lower-level representations such as binary or assembly (neural decompilation),
partial representations of source code (code completion and repair), and source
code in another language (code translation). In this paper we survey the NMT
for code generation literature, cataloging the variety of methods that have
been explored according to input and output representations, model
architectures, optimization techniques used, data sets, and evaluation methods.
We discuss the limitations of existing methods and future research directions
Related papers
- Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text [1.9185059111021852]
We investigate how pre-trained Language Models handle code-switched text in three dimensions.
Our findings reveal that pre-trained language models are effective in generalising to code-switched text.
arXiv Detail & Related papers (2024-03-07T19:46:03Z) - Neural Models for Source Code Synthesis and Completion [0.0]
Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet.
Current approaches mainly involve hard-coded, rule-based systems based on semantic parsing.
We present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages.
arXiv Detail & Related papers (2024-02-08T17:10:12Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT)
We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods.
We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z) - DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [8.566457170664927]
We employ neural techniques to solve the task of source code summarizing.
With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
arXiv Detail & Related papers (2020-03-31T22:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.