Pitfalls of Static Language Modelling
- URL: http://arxiv.org/abs/2102.01951v1
- Date: Wed, 3 Feb 2021 09:01:49 GMT
- Title: Pitfalls of Static Language Modelling
- Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang
Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume,
Sebastian Ruder, Dani Yogatama, Kris Cao, Tomas Kocisky, Susannah Young, Phil
Blunsom
- Abstract summary: We show that state-of-the-art Transformer models perform worse in the realistic setup of predicting future utterances from beyond their training period.
We argue that now is the right time to rethink our static language modelling evaluation protocol.
- Score: 41.76918612574081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our world is open-ended, non-stationary and constantly evolving; thus what we
talk about and how we talk about it changes over time. This inherent dynamic
nature of language comes in stark contrast to the current static language
modelling paradigm, which constructs training and evaluation sets from
overlapping time periods. Despite recent progress, we demonstrate that
state-of-the-art Transformer models perform worse in the realistic setup of
predicting future utterances from beyond their training period -- a consistent
pattern across three datasets from two domains. We find that, while increasing
model size alone -- a key driver behind recent progress -- does not provide a
solution for the temporal generalization problem, having models that
continually update their knowledge with new information can indeed slow down
the degradation over time. Hence, given the compilation of ever-larger language
modelling training datasets, combined with the growing list of
language-model-based NLP applications that require up-to-date knowledge about
the world, we argue that now is the right time to rethink our static language
modelling evaluation protocol, and develop adaptive language models that can
remain up-to-date with respect to our ever-changing and non-stationary world.
Related papers
- Time Machine GPT [15.661920010658626]
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora.
This approach is not aligned with the evolving nature of language.
This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT)
arXiv Detail & Related papers (2024-04-29T09:34:25Z) - More Room for Language: Investigating the Effect of Retrieval on Language Models [3.8574940917179164]
We introduce an 'ideal retrieval' methodology to study these models in a fully controllable setting.
We conduct an evaluation to examine how retrieval augmentation affects the behavior of the underlying language model.
arXiv Detail & Related papers (2024-04-16T22:43:48Z) - Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models [74.81091933317882]
We introduce EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database.
We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge.
Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.
arXiv Detail & Related papers (2023-11-14T12:12:02Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - Learning Temporal Dynamics from Cycles in Narrated Video [85.89096034281694]
We propose a self-supervised solution to the problem of learning to model how the world changes as time elapses.
Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed.
We apply the learned dynamics model without further training to various tasks, such as predicting future action and temporally ordering sets of images.
arXiv Detail & Related papers (2021-01-07T02:41:32Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.