E.T.: Entity-Transformers. Coreference augmented Neural Language Model
for richer mention representations via Entity-Transformer blocks
- URL: http://arxiv.org/abs/2011.05431v1
- Date: Tue, 10 Nov 2020 22:28:00 GMT
- Title: E.T.: Entity-Transformers. Coreference augmented Neural Language Model
for richer mention representations via Entity-Transformer blocks
- Authors: Nikolaos Stylianou, Ioannis Vlahavas
- Abstract summary: We present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2.
Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present.
- Score: 3.42658286826597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last decade, the field of Neural Language Modelling has witnessed
enormous changes, with the development of novel models through the use of
Transformer architectures. However, even these models struggle to model long
sequences due to memory constraints and increasing computational complexity.
Coreference annotations over the training data can provide context far beyond
the modelling limitations of such language models. In this paper we present an
extension over the Transformer-block architecture used in neural language
models, specifically in GPT2, in order to incorporate entity annotations during
training. Our model, GPT2E, extends the Transformer layers architecture of GPT2
to Entity-Transformers, an architecture designed to handle coreference
information when present. To that end, we achieve richer representations for
entity mentions, with insignificant training cost. We show the comparative
model performance between GPT2 and GPT2E in terms of Perplexity on the CoNLL
2012 and LAMBADA datasets as well as the key differences in the entity
representations and their effects in downstream tasks such as Named Entity
Recognition. Furthermore, our approach can be adopted by the majority of
Transformer-based language models.
Related papers
- Efficient GPT Model Pre-training using Tensor Train Matrix
Representation [65.96485282393361]
Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
arXiv Detail & Related papers (2023-06-05T08:38:25Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - CoreLM: Coreference-aware Language Model Fine-Tuning [0.0]
We propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models.
We make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost.
Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes.
arXiv Detail & Related papers (2021-11-04T08:44:31Z) - Deep Learning Transformer Architecture for Named Entity Recognition on
Low Resourced Languages: State of the art results [0.0]
This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages.
The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language.
Further research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications.
arXiv Detail & Related papers (2021-11-01T11:02:01Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.