Dynamic Language Models for Continuously Evolving Content
- URL: http://arxiv.org/abs/2106.06297v1
- Date: Fri, 11 Jun 2021 10:33:50 GMT
- Title: Dynamic Language Models for Continuously Evolving Content
- Authors: Spurthi Amba Hombaiah and Tao Chen and Mingyang Zhang and Michael
Bendersky and Marc Najork
- Abstract summary: In recent years, pre-trained language models like BERT greatly improved the state-of-the-art for content understanding tasks.
In this paper, we aim to study how these language models can be adapted to better handle continuously evolving web content.
- Score: 19.42658043326054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The content on the web is in a constant state of flux. New entities, issues,
and ideas continuously emerge, while the semantics of the existing conversation
topics gradually shift. In recent years, pre-trained language models like BERT
greatly improved the state-of-the-art for a large spectrum of content
understanding tasks. Therefore, in this paper, we aim to study how these
language models can be adapted to better handle continuously evolving web
content. In our study, we first analyze the evolution of 2013 - 2019 Twitter
data, and unequivocally confirm that a BERT model trained on past tweets would
heavily deteriorate when directly applied to data from later years. Then, we
investigate two possible sources of the deterioration: the semantic shift of
existing tokens and the sub-optimal or failed understanding of new tokens. To
this end, we both explore two different vocabulary composition methods, as well
as propose three sampling methods which help in efficient incremental training
for BERT-like models. Compared to a new model trained from scratch offline, our
incremental training (a) reduces the training costs, (b) achieves better
performance on evolving content, and (c) is suitable for online deployment. The
superiority of our methods is validated using two downstream tasks. We
demonstrate significant improvements when incrementally evolving the model from
a particular base year, on the task of Country Hashtag Prediction, as well as
on the OffensEval 2019 task.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - VIBE: Topic-Driven Temporal Adaptation for Twitter Classification [9.476760540618903]
We study temporal adaptation, where models trained on past data are tested in the future.
Our model, with only 3% of data, significantly outperforms previous state-of-the-art continued-pretraining methods.
arXiv Detail & Related papers (2023-10-16T08:53:57Z) - Meta-Learning Online Adaptation of Language Models [88.8947656843812]
Large language models encode impressively broad world knowledge in their parameters.
However, the knowledge in static language models falls out of date, limiting the model's effective "shelf life"
arXiv Detail & Related papers (2023-05-24T11:56:20Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - A Comprehensive Comparison of Pre-training Language Models [0.5139874302398955]
We pre-train a list of transformer-based models with the same amount of text and the same training steps.
The experimental results show that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding.
arXiv Detail & Related papers (2021-06-22T02:12:29Z) - Pre-Training BERT on Arabic Tweets: Practical Considerations [11.087099497830552]
We pretrained 5 BERT models that differ in the size of their training sets, mixture of formal and informal Arabic, and linguistic preprocessing.
All are intended to support Arabic dialects and social media.
New models achieve new state-of-the-art results on several downstream tasks.
arXiv Detail & Related papers (2021-02-21T20:51:33Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.