A Comprehensive Review of State-of-The-Art Methods for Java Code
Generation from Natural Language Text
- URL: http://arxiv.org/abs/2306.06371v1
- Date: Sat, 10 Jun 2023 07:27:51 GMT
- Title: A Comprehensive Review of State-of-The-Art Methods for Java Code
Generation from Natural Language Text
- Authors: Jessica L\'opez Espejel, Mahaman Sanoussi Yahaya Alassan, El Mehdi
Chouham, Walid Dahhane, El Hassane Ettifouri
- Abstract summary: This paper provides a comprehensive review of the evolution and progress of deep learning models in Java code generation task.
We focus on the most important methods and present their merits and limitations, as well as the objective functions used by the community.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Java Code Generation consists in generating automatically Java code from a
Natural Language Text. This NLP task helps in increasing programmers'
productivity by providing them with immediate solutions to the simplest and
most repetitive tasks. Code generation is a challenging task because of the
hard syntactic rules and the necessity of a deep understanding of the semantic
aspect of the programming language. Many works tried to tackle this task using
either RNN-based, or Transformer-based models. The latter achieved remarkable
advancement in the domain and they can be divided into three groups: (1)
encoder-only models, (2) decoder-only models, and (3) encoder-decoder models.
In this paper, we provide a comprehensive review of the evolution and progress
of deep learning models in Java code generation task. We focus on the most
important methods and present their merits and limitations, as well as the
objective functions used by the community. In addition, we provide a detailed
description of datasets and evaluation metrics used in the literature. Finally,
we discuss results of different models on CONCODE dataset, then propose some
future directions.
Related papers
- CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code
Generation [19.489202790935902]
We propose a syntax-guided multi-task learning approach TurduckenGen.
Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints.
Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code.
arXiv Detail & Related papers (2023-03-09T06:22:07Z) - Deep Learning Based Code Generation Methods: Literature Review [30.17038624027751]
This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions.
In this paper, we systematically review the current work on deep learning-based code generation methods.
arXiv Detail & Related papers (2023-03-02T08:25:42Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - NatGen: Generative pre-training by "Naturalizing" source code [18.410818213965918]
We propose a new pre-training objective, "Naturalizing" of source code.
Unlike natural language, code's bimodal, dual-channel nature allows us to generate semantically equivalent code at scale.
We fine-tune our model in three generative Software Engineering tasks to achieve state-of-the-art performance rivaling CodeT5.
arXiv Detail & Related papers (2022-06-15T15:08:29Z) - Meta Learning for Code Summarization [10.403206672504664]
We show that three SOTA models for code summarization work well on largely disjoint subsets of a large code-base.
We propose three meta-models that select the best candidate summary for a given code segment.
arXiv Detail & Related papers (2022-01-20T17:23:34Z) - Data-to-text Generation with Macro Planning [61.265321323312286]
We propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods.
Our approach outperforms competitive baselines in terms of automatic and human evaluation.
arXiv Detail & Related papers (2021-02-04T16:32:57Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.