TemporalWiki: A Lifelong Benchmark for Training and Evaluating
Ever-Evolving Language Models
- URL: http://arxiv.org/abs/2204.14211v3
- Date: Wed, 12 Apr 2023 12:16:59 GMT
- Title: TemporalWiki: A Lifelong Benchmark for Training and Evaluating
Ever-Evolving Language Models
- Authors: Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin,
Janghoon Han, Gyeonghun Kim, Minjoon Seo
- Abstract summary: TemporalWiki is a lifelong benchmark for ever-evolving Language Models (LMs)
It allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time.
We find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost.
- Score: 31.900232508466928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language Models (LMs) become outdated as the world changes; they often fail
to perform tasks requiring recent factual information which was absent or
different during training, a phenomenon called temporal misalignment. This is
especially a challenging problem because the research community still lacks a
coherent dataset for assessing the adaptability of LMs to frequently-updated
knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a
lifelong benchmark for ever-evolving LMs that utilizes the difference between
consecutive snapshots of English Wikipedia and English Wikidata for training
and evaluation, respectively. The benchmark hence allows researchers to
periodically track an LM's ability to retain previous knowledge and acquire
updated/new knowledge at each point in time. We also find that training an LM
on the diff data through continual learning methods achieves similar or better
perplexity than on the entire snapshot in our benchmark with 12 times less
computational cost, which verifies that factual knowledge in LMs can be safely
updated with minimal training data via continual learning. The dataset and the
code are available at https://github.com/joeljang/temporalwiki.
Related papers
- Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning [2.8972337324168014]
We study how PLM may learn and remember new world knowledge facts that do not occur in their pre-training corpus.
We first propose Novel-WD, a new dataset consisting of sentences containing novel facts extracted from recent Wikidata updates.
We make this dataset freely available to the community, and release a procedure to later build new versions of similar datasets with up-to-date information.
arXiv Detail & Related papers (2024-08-30T07:54:50Z) - HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits [92.62157408704594]
HelloFresh is based on continuous streams of real-world data generated by intrinsically motivated human labelers.
It covers recent events from X (formerly Twitter) community notes and edits of Wikipedia pages.
It mitigates the risk of test data contamination and benchmark overfitting.
arXiv Detail & Related papers (2024-06-05T16:25:57Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in
Question Answering Models [31.43391633383255]
We construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date.
We evaluate our models quarterly as they read new articles not seen in pre-training.
We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-05-23T15:33:41Z) - Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated.
We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained.
We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z) - Lifelong Pretraining: Continually Adapting Language Models to Emerging
Corpora [31.136334214818305]
We study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data.
Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms.
Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach.
arXiv Detail & Related papers (2021-10-16T09:59:33Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Time-Aware Language Models as Temporal Knowledge Bases [39.00042720454899]
Language models (LMs) are trained on snapshots of data collected at a specific moment in time.
We introduce a diagnostic dataset aimed at probing LMs for factual knowledge that changes over time.
We propose a simple technique for jointly modeling text with its timestamp.
arXiv Detail & Related papers (2021-06-29T06:18:57Z) - Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL)
Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.