Related papers: Language Modelling for Source Code with Transformer-XL

Language Modelling for Source Code with Transformer-XL

URL: http://arxiv.org/abs/2007.15813v1
Date: Fri, 31 Jul 2020 02:42:18 GMT
Title: Language Modelling for Source Code with Transformer-XL
Authors: Thomas Dowdell, Hongyu Zhang
Abstract summary: We conduct an experimental evaluation of state-of-the-art neural language models for source code. We find that the Transformer-XL model outperforms RNN-based models in capturing the naturalness of software.
Score: 7.967230034960396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has been found that software, like natural language texts, exhibits "naturalness", which can be captured by statistical language models. In recent years, neural language models have been proposed to represent the naturalness of software through deep learning. In this paper, we conduct an experimental evaluation of state-of-the-art neural language models for source code, including RNN-based models and Transformer-XL based models. Through experiments on a large-scale Python code corpus, we find that the Transformer-XL model outperforms RNN-based models (including LSTM and GRU models) in capturing the naturalness of software, with far less computational cost.

Related papers

Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks [0.0]
We evaluate how a Large Language Model might be used for formalizing natural language into simulation models. We develop a synthetic data generator to serve as the basis for fine-tuning and evaluation. Our evaluation shows that our fine-tuned Mistral model can recover the ground truth simulation model in up to 84.5% of cases.
arXiv Detail & Related papers (2025-03-03T15:48:01Z)
Tracking Universal Features Through Fine-Tuning and Model Merging [13.600774910410514]
We study how features emerge, disappear, and persist across models fine-tuned on different domains of text. Our exploration aims to provide deeper insights into the stability and transformation of features across typical transfer-learning scenarios.
arXiv Detail & Related papers (2024-10-16T09:18:39Z)
Explicit Word Density Estimation for Language Modelling [24.8651840630298]
We propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows. In this work we propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows and manage to improve on some of the baselines.
arXiv Detail & Related papers (2024-06-10T15:21:33Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
Qwen Technical Report [132.54304067403922]
We introduce Qwen, the first installment of our large language model series. Qwen is the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. We have also developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat.
arXiv Detail & Related papers (2023-09-28T17:07:49Z)
N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z)
Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
Deep Learning Transformer Architecture for Named Entity Recognition on Low Resourced Languages: State of the art results [0.0]
This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages. The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language. Further research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications.
arXiv Detail & Related papers (2021-11-01T11:02:01Z)
Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages [117.34242908773061]
CodeBERT is a pre-trained model for programming language (PL) and nat-ural language (NL) We develop CodeBERT with Transformer-based neural architecture. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters.
arXiv Detail & Related papers (2020-02-19T13:09:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.