Related papers: TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

URL: http://arxiv.org/abs/2204.14211v3
Date: Wed, 12 Apr 2023 12:16:59 GMT
Title: TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models
Authors: Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Minjoon Seo
Abstract summary: TemporalWiki is a lifelong benchmark for ever-evolving Language Models (LMs) It allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost.
Score: 31.900232508466928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning. The dataset and the code are available at https://github.com/joeljang/temporalwiki.

Related papers

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining [41.629324249810054]
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) We also design time-stratified evaluations across both general CC data and specific domains.
arXiv Detail & Related papers (2025-04-02T20:11:54Z)
EvoWiki: Evaluating LLMs on Evolving Knowledge [72.92365627254063]
EvoWiki is an evolving dataset designed to reflect knowledge evolution by categorizing information into stable, evolved, and uncharted states. Our results indicate that current models often struggle with evolved knowledge, frequently providing outdated or incorrect responses. EvoWiki provides a robust benchmark for advancing future research on the knowledge evolution capabilities of large language models.
arXiv Detail & Related papers (2024-12-18T08:04:57Z)
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning [2.8972337324168014]
We study how PLM may learn and remember new world knowledge facts that do not occur in their pre-training corpus. We first propose Novel-WD, a new dataset consisting of sentences containing novel facts extracted from recent Wikidata updates. We make this dataset freely available to the community, and release a procedure to later build new versions of similar datasets with up-to-date information.
arXiv Detail & Related papers (2024-08-30T07:54:50Z)
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits [92.62157408704594]
HelloFresh is based on continuous streams of real-world data generated by intrinsically motivated human labelers. It covers recent events from X (formerly Twitter) community notes and edits of Wikipedia pages. It mitigates the risk of test data contamination and benchmark overfitting.
arXiv Detail & Related papers (2024-06-05T16:25:57Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models [31.43391633383255]
We construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date. We evaluate our models quarterly as they read new articles not seen in pre-training. We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-05-23T15:33:41Z)
Entity Cloze By Date: What LMs Know About Unseen Entities [79.34707800653597]
Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity.
arXiv Detail & Related papers (2022-05-05T17:59:31Z)
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora [31.136334214818305]
We study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms. Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach.
arXiv Detail & Related papers (2021-10-16T09:59:33Z)
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy. In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online. We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z)
Time-Aware Language Models as Temporal Knowledge Bases [39.00042720454899]
Language models (LMs) are trained on snapshots of data collected at a specific moment in time. We introduce a diagnostic dataset aimed at probing LMs for factual knowledge that changes over time. We propose a simple technique for jointly modeling text with its timestamp.
arXiv Detail & Related papers (2021-06-29T06:18:57Z)
Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL) Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.