Language Modelling for Source Code with Transformer-XL
- URL: http://arxiv.org/abs/2007.15813v1
- Date: Fri, 31 Jul 2020 02:42:18 GMT
- Title: Language Modelling for Source Code with Transformer-XL
- Authors: Thomas Dowdell, Hongyu Zhang
- Abstract summary: We conduct an experimental evaluation of state-of-the-art neural language models for source code.
We find that the Transformer-XL model outperforms RNN-based models in capturing the naturalness of software.
- Score: 7.967230034960396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been found that software, like natural language texts, exhibits
"naturalness", which can be captured by statistical language models. In recent
years, neural language models have been proposed to represent the naturalness
of software through deep learning. In this paper, we conduct an experimental
evaluation of state-of-the-art neural language models for source code,
including RNN-based models and Transformer-XL based models. Through experiments
on a large-scale Python code corpus, we find that the Transformer-XL model
outperforms RNN-based models (including LSTM and GRU models) in capturing the
naturalness of software, with far less computational cost.
Related papers
- Tracking Universal Features Through Fine-Tuning and Model Merging [13.600774910410514]
We study how features emerge, disappear, and persist across models fine-tuned on different domains of text.
Our exploration aims to provide deeper insights into the stability and transformation of features across typical transfer-learning scenarios.
arXiv Detail & Related papers (2024-10-16T09:18:39Z) - Explicit Word Density Estimation for Language Modelling [24.8651840630298]
We propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows.
In this work we propose a new family of language models based on NeuralODEs and the continuous analogue of Normalizing Flows and manage to improve on some of the baselines.
arXiv Detail & Related papers (2024-06-10T15:21:33Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Qwen Technical Report [132.54304067403922]
We introduce Qwen, the first installment of our large language model series.
Qwen is the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques.
We have also developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat.
arXiv Detail & Related papers (2023-09-28T17:07:49Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Deep Learning Transformer Architecture for Named Entity Recognition on
Low Resourced Languages: State of the art results [0.0]
This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages.
The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language.
Further research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications.
arXiv Detail & Related papers (2021-11-01T11:02:01Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z) - CodeBERT: A Pre-Trained Model for Programming and Natural Languages [117.34242908773061]
CodeBERT is a pre-trained model for programming language (PL) and nat-ural language (NL)
We develop CodeBERT with Transformer-based neural architecture.
We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters.
arXiv Detail & Related papers (2020-02-19T13:09:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.