Automated Source Code Generation and Auto-completion Using Deep
Learning: Comparing and Discussing Current Language-Model-Related Approaches
- URL: http://arxiv.org/abs/2009.07740v4
- Date: Tue, 12 Jan 2021 10:54:20 GMT
- Title: Automated Source Code Generation and Auto-completion Using Deep
Learning: Comparing and Discussing Current Language-Model-Related Approaches
- Authors: Juan Cruz-Benito, Sanjay Vishwakarma, Francisco Martin-Fernandez,
Ismael Faro
- Abstract summary: This paper compares different deep learning architectures to create and use language models based on programming code.
We discuss each approach's different strengths and weaknesses and what gaps we find to evaluate the language models or apply them in a real programming context.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the use of deep learning in language models gained much
attention. Some research projects claim that they can generate text that can be
interpreted as human-writing, enabling new possibilities in many application
areas. Among the different areas related to language processing, one of the
most notable in applying this type of modeling is programming languages. For
years, the Machine Learning community has been researching this software
engineering area, pursuing goals like applying different approaches to
auto-complete, generate, fix, or evaluate code programmed by humans.
Considering the increasing popularity of the Deep-Learning-enabled language
models approach, we detected a lack of empirical papers that compare different
deep learning architectures to create and use language models based on
programming code. This paper compares different neural network architectures
like AWD-LSTMs, AWD-QRNNs, and Transformer while using transfer learning and
different tokenizations to see how they behave in building language models
using a Python dataset for code generation and filling mask tasks. Considering
the results, we discuss each approach's different strengths and weaknesses and
what gaps we find to evaluate the language models or apply them in a real
programming context.
Related papers
- CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization
in Programming Language Understanding [5.9535699822923]
We propose a new benchmark dataset called GenCodeSearchNet (GeCS) to evaluate the programming language understanding capabilities of language models.
As part of the full dataset, we introduce a new, manually curated subset StatCodeSearch that focuses on R, a popular but so far underrepresented programming language.
For evaluation and comparison, we collect several baseline results using fine-tuned BERT-style models and GPT-style large language models.
arXiv Detail & Related papers (2023-11-16T09:35:00Z) - Language Models are Universal Embedders [48.12992614723464]
We show that pre-trained transformer decoders can embed universally when finetuned on limited English data.
Our models achieve competitive performance on different embedding tasks by minimal training data.
These results provide evidence of a promising path towards building powerful unified embedders.
arXiv Detail & Related papers (2023-10-12T11:25:46Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - On the Impact of Language Selection for Training and Evaluating
Programming Language Models [16.125924759649106]
We evaluate the similarity of programming languages by analyzing their representations using a CodeBERT-based model.
Our experiments reveal that token representation in languages such as C++, Python, and Java exhibit proximity to one another, whereas the same tokens in languages such as Mathematica and R display significant dissimilarity.
arXiv Detail & Related papers (2023-08-25T12:57:59Z) - Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X.
These datasets cover over 10 programming languages.
We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Language Models are not Models of Language [0.0]
Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance.
We argue that the term language model is misleading because deep learning models are not theoretical models of language.
arXiv Detail & Related papers (2021-12-13T22:39:46Z) - Cross-Lingual Adaptation for Type Inference [29.234418962960905]
We propose a cross-lingual adaptation framework, PLATO, to transfer a deep learning-based type inference procedure across weakly typed languages.
By leveraging data from strongly typed languages, PLATO improves the perplexity of the backbone cross-programming-language model.
arXiv Detail & Related papers (2021-07-01T00:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.