Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
- URL: http://arxiv.org/abs/2311.08106v2
- Date: Sat, 20 Apr 2024 07:11:39 GMT
- Title: Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
- Authors: Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun,
- Abstract summary: We introduce EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database.
We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge.
Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.
- Score: 74.81091933317882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database. The construction of EvolvingQA is automated with our pipeline using large language models. We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge. Our analysis suggests that models fail to rectify knowledge due to small weight gradients. In addition, we elucidate that language models particularly struggle to reflect the change of numerical or temporal information. Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.
Related papers
- GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? [36.987716816134984]
We propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates.
Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated.
We introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval.
arXiv Detail & Related papers (2024-06-09T01:16:04Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in
Question Answering Models [31.43391633383255]
We construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date.
We evaluate our models quarterly as they read new articles not seen in pre-training.
We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-05-23T15:33:41Z) - Pitfalls of Static Language Modelling [41.76918612574081]
We show that state-of-the-art Transformer models perform worse in the realistic setup of predicting future utterances from beyond their training period.
We argue that now is the right time to rethink our static language modelling evaluation protocol.
arXiv Detail & Related papers (2021-02-03T09:01:49Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Facts as Experts: Adaptable and Interpretable Neural Memory over
Symbolic Knowledge [38.48518306055536]
We develop a neural language model that includes an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.
We show that this model dramatically improves performance on two knowledge-intensive question-answering tasks.
arXiv Detail & Related papers (2020-07-02T03:05:41Z) - Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.