CoreLM: Coreference-aware Language Model Fine-Tuning
- URL: http://arxiv.org/abs/2111.02687v1
- Date: Thu, 4 Nov 2021 08:44:31 GMT
- Title: CoreLM: Coreference-aware Language Model Fine-Tuning
- Authors: Nikolaos Stylianou, Ioannis Vlahavas
- Abstract summary: We propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models.
We make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost.
Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language Models are the underpin of all modern Natural Language Processing
(NLP) tasks. The introduction of the Transformers architecture has contributed
significantly into making Language Modeling very effective across many NLP
task, leading to significant advancements in the field. However, Transformers
come with a big computational cost, which grows quadratically with respect to
the input length. This presents a challenge as to understand long texts
requires a lot of context. In this paper, we propose a Fine-Tuning framework,
named CoreLM, that extends the architecture of current Pretrained Language
Models so that they incorporate explicit entity information. By introducing
entity representations, we make available information outside the contextual
space of the model, which results in a better Language Model for a fraction of
the computational cost. We implement our approach using GPT2 and compare the
fine-tuned model to the original. Our proposed model achieves a lower
Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a
fine-tuned version of GPT2 without any changes. We also compare the models'
performance in terms of Accuracy in LAMBADA and Children's Book Test, with and
without the use of model-created coreference annotations.
Related papers
- Large Concept Models: Language Modeling in a Sentence Representation Space [62.73366944266477]
We present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept.
Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow.
We show that our model exhibits impressive zero-shot generalization performance to many languages.
arXiv Detail & Related papers (2024-12-11T23:36:20Z) - Can bidirectional encoder become the ultimate winner for downstream applications of foundation models? [1.8120356834558644]
Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning.
BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model.
This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model.
arXiv Detail & Related papers (2024-11-27T03:31:14Z) - Contrastive Alignment of Vision to Language Through Parameter-Efficient
Transfer Learning [60.26952378997713]
Contrastive vision-language models (e.g. CLIP) are created by updating all the parameters of a vision model and language model through contrastive training.
We show that a minimal set of parameter updates ($$7%) can achieve the same performance as full-model training.
We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training.
arXiv Detail & Related papers (2023-03-21T14:12:08Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - E.T.: Entity-Transformers. Coreference augmented Neural Language Model
for richer mention representations via Entity-Transformer blocks [3.42658286826597]
We present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2.
Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present.
arXiv Detail & Related papers (2020-11-10T22:28:00Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Abstractive Text Summarization based on Language Model Conditioning and
Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model.
In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size.
The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.