JaCoText: A Pretrained Model for Java Code-Text Generation
- URL: http://arxiv.org/abs/2303.12869v1
- Date: Wed, 22 Mar 2023 19:01:25 GMT
- Title: JaCoText: A Pretrained Model for Java Code-Text Generation
- Authors: Jessica L\'opez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid
Dahhane, El Hassane Ettifouri
- Abstract summary: We introduce JaCoText, a model based on Transformers neural network.
It aims to generate java source code from natural language text.
experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pretrained transformer-based models have shown high performance in natural
language generation task. However, a new wave of interest has surged: automatic
programming language generation. This task consists of translating natural
language instructions to a programming code. Despite the fact that well-known
pretrained models on language generation have achieved good performance in
learning programming languages, effort is still needed in automatic code
generation. In this paper, we introduce JaCoText, a model based on Transformers
neural network. It aims to generate java source code from natural language
text. JaCoText leverages advantages of both natural language and code
generation models. More specifically, we study some findings from the state of
the art and use them to (1) initialize our model from powerful pretrained
models, (2) explore additional pretraining on our java dataset, (3) carry out
experiments combining the unimodal and bimodal data in the training, and (4)
scale the input and output length during the fine-tuning of the model.
Conducted experiments on CONCODE dataset show that JaCoText achieves new
state-of-the-art results.
Related papers
- A Comprehensive Review of State-of-The-Art Methods for Java Code
Generation from Natural Language Text [0.0]
This paper provides a comprehensive review of the evolution and progress of deep learning models in Java code generation task.
We focus on the most important methods and present their merits and limitations, as well as the objective functions used by the community.
arXiv Detail & Related papers (2023-06-10T07:27:51Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - NatGen: Generative pre-training by "Naturalizing" source code [18.410818213965918]
We propose a new pre-training objective, "Naturalizing" of source code.
Unlike natural language, code's bimodal, dual-channel nature allows us to generate semantically equivalent code at scale.
We fine-tune our model in three generative Software Engineering tasks to achieve state-of-the-art performance rivaling CodeT5.
arXiv Detail & Related papers (2022-06-15T15:08:29Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - JavaBERT: Training a transformer-based model for the Java programming
language [1.599072005190786]
We introduce a data retrieval pipeline for software code and train a model upon Java software code.
The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task.
arXiv Detail & Related papers (2021-10-20T06:49:41Z) - Automatic Code Generation using Pre-Trained Language Models [0.0]
We propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models.
We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46% over a reasonable sequence-to-sequence baseline.
arXiv Detail & Related papers (2021-02-21T07:21:26Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space [109.79957125584252]
Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language.
In this paper, we propose the first large-scale language VAE model, Optimus.
arXiv Detail & Related papers (2020-04-05T06:20:18Z) - CodeBERT: A Pre-Trained Model for Programming and Natural Languages [117.34242908773061]
CodeBERT is a pre-trained model for programming language (PL) and nat-ural language (NL)
We develop CodeBERT with Transformer-based neural architecture.
We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters.
arXiv Detail & Related papers (2020-02-19T13:09:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.