On the validity of pre-trained transformers for natural language
processing in the software engineering domain
- URL: http://arxiv.org/abs/2109.04738v1
- Date: Fri, 10 Sep 2021 08:46:31 GMT
- Title: On the validity of pre-trained transformers for natural language
processing in the software engineering domain
- Authors: Julian von der Mosel, Alexander Trautsch, Steffen Herbold
- Abstract summary: We compare BERT transformer models trained with software engineering data with transformers based on general domain data.
Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable.
- Score: 78.32146765053318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers are the current state-of-the-art of natural language processing
in many domains and are using traction within software engineering research as
well. Such models are pre-trained on large amounts of data, usually from the
general domain. However, we only have a limited understanding regarding the
validity of transformers within the software engineering domain, i.e., how good
such models are at understanding words and sentences within a software
engineering context and how this improves the state-of-the-art. Within this
article, we shed light on this complex, but crucial issue. We compare BERT
transformer models trained with software engineering data with transformers
based on general domain data in multiple dimensions: their vocabulary, their
ability to understand which words are missing, and their performance in
classification tasks. Our results show that for tasks that require
understanding of the software engineering context, pre-training with software
engineering data is valuable, while general domain models are sufficient for
general language understanding, also within the software engineering domain.
Related papers
- Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information.
This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z) - Anatomy of Neural Language Models [0.0]
Transformer-based Language Models (LMs) have led to new state-of-the-art results in a wide spectrum of applications.
Transformers pretrained on language-modeling-like tasks have been widely adopted in computer vision and time series applications.
arXiv Detail & Related papers (2024-01-08T10:27:25Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs.
In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations.
We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - CodeTrans: Towards Cracking the Language of Silicone's Code Through
Self-Supervised Deep Learning and High Performance Computing [4.111243115567736]
This paper describes CodeTrans - an encoder-decoder transformer model for tasks in the software engineering domain.
It explores the effectiveness of encoder-decoder transformer models for six software engineering tasks, including thirteen sub-tasks.
CodeTrans outperforms the state-of-the-art models on all the tasks.
arXiv Detail & Related papers (2021-04-06T11:57:12Z) - Empirical Study of Transformers for Source Code [14.904366372190943]
We study the capabilities of Transformers to utilize syntactic information in different tasks.
We show that Transformers are able to make meaningful predictions based purely on syntactic information.
arXiv Detail & Related papers (2020-10-15T19:09:15Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.