TiEBe: Tracking Language Model Recall of Notable Worldwide Events Through Time
- URL: http://arxiv.org/abs/2501.07482v2
- Date: Tue, 20 May 2025 17:09:53 GMT
- Title: TiEBe: Tracking Language Model Recall of Notable Worldwide Events Through Time
- Authors: Thales Sales Almeida, Giovana Kerche Bonás, João Guilherme Alves Santos, Hugo Abonizio, Rodrigo Nogueira,
- Abstract summary: We present TiEBe, a dataset of over 23,000 question-answer pairs centered on notable global and regional events.<n>These events are then used to construct a benchmark to evaluate LLMs' understanding of global and regional developments.<n>Our results reveal significant geographic disparities in factual recall, emphasizing the need for more balanced global representation.
- Score: 9.745912505259312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the knowledge landscape evolves and large language models (LLMs) become increasingly widespread, there is a growing need to keep these models updated with current events. While existing benchmarks assess general factual recall, few studies explore how LLMs retain knowledge over time or across different regions. To address these gaps, we present the Timely Events Benchmark (TiEBe), a dataset of over 23,000 question-answer pairs centered on notable global and regional events, spanning more than 10 years of events, 23 regions, and 13 languages. TiEBe leverages structured retrospective data from Wikipedia to identify notable events through time. These events are then used to construct a benchmark to evaluate LLMs' understanding of global and regional developments, grounded in factual evidence beyond Wikipedia itself. Our results reveal significant geographic disparities in factual recall, emphasizing the need for more balanced global representation in LLM training. We also observe a Pearson correlation of more than 0.7 between models' performance in TiEBe and various countries' socioeconomic indicators, such as HDI. In addition, we examine the impact of language on factual recall by posing questions in the native language of the region where each event occurred, uncovering substantial performance gaps for low-resource languages.
Related papers
- LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning [9.319308493696893]
Large language models (LLMs) have been widely evaluated on macro-scale geographic tasks, but their ability to handle hyper-local knowledge remains poorly understood.<n>We present LocalBench, the first benchmark designed to evaluate LLMs on county-level local knowledge across the United States.<n>Using LocalBench, we evaluate 13 state-of-the-art LLMs under both closed-book and web-augmented settings.
arXiv Detail & Related papers (2025-11-13T16:26:13Z) - Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z) - The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities [12.46765303763981]
Large Language Models (LLMs) have been extensively tuned to mitigate explicit biases, yet they often exhibit subtle implicit biases rooted in their pre-training data.<n>We propose studying how models behave when they proactively ask questions themselves.<n>The 20 Questions game, a multi-turn deduction task, serves as an ideal testbed for this purpose.
arXiv Detail & Related papers (2025-08-07T15:53:30Z) - Around the World in 24 Hours: Probing LLM Knowledge of Time and Place [18.17538075862074]
We present the first evaluation of the ability of language models to jointly reason over time and space.<n>We evaluate eight open chat models of three different model families for different combinations of temporal and geographic knowledge.<n>We do not find clear correlations of performance with specific geographic regions.
arXiv Detail & Related papers (2025-06-04T14:14:28Z) - Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs [38.26693373272882]
We introduce two new benchmarks: KnowRecall and VisRecall.<n>KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages.<n>VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images.
arXiv Detail & Related papers (2025-05-21T03:43:37Z) - ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.<n>We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.<n>Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z) - ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events [0.20132569095596248]
We present ChronoSense, a new benchmark for evaluating Large Language Models' temporal understanding.<n>We assess the performance of seven recent LLMs using this benchmark and the results indicate that models handle Allen relations, even symmetrical ones, quite differently.<n>Overall, the models' low performance highlights the need for improved temporal understanding in LLMs.
arXiv Detail & Related papers (2025-01-06T14:27:41Z) - Gradient Localization Improves Lifelong Pretraining of Language Models [32.29298047707914]
Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters.
In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
arXiv Detail & Related papers (2024-11-07T05:43:50Z) - ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains [19.428141279030527]
Large language models (LLMs) have brought significant changes to many aspects of our lives.<n>Existing approaches fall short in addressing the temporal adaptability of knowledge.<n>We present ChroKnowledge, a novel sampling-based framework for evaluating LLMs' non-parametric chronological knowledge.
arXiv Detail & Related papers (2024-10-13T15:08:49Z) - Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time [0.0]
In real-world scenarios, the correctness of answers is frequently tied to temporal context.<n>We present a novel framework and dataset spanning over 8,000 events from 2018 to 2024.<n>Our work provides a significant step toward advancing time-aware language models.
arXiv Detail & Related papers (2024-09-20T08:57:20Z) - Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? [36.987716816134984]
We propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates.
Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated.
We introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval.
arXiv Detail & Related papers (2024-06-09T01:16:04Z) - Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models [51.891804790725686]
Elements of World Knowledge (EWoK) is a framework for evaluating language models' understanding of conceptual knowledge underlying world modeling.<n>EWoK-core-1.0 is a dataset of 4,374 items covering 11 world knowledge domains.<n>All tested models perform worse than humans, with results varying drastically across domains.
arXiv Detail & Related papers (2024-05-15T17:19:42Z) - Towards Incremental Learning in Large Language Models: A Critical Review [0.0]
This review provides a comprehensive analysis of incremental learning in Large Language Models.
It synthesizes the state-of-the-art incremental learning paradigms, including continual learning, meta-learning, parameter-efficient learning, and mixture-of-experts learning.
An important finding is that many of these approaches do not update the core model, and none of them update incrementally in real-time.
arXiv Detail & Related papers (2024-04-28T20:44:53Z) - Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models [1.472789264981363]
Large Language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited.
Our study experiments with 12 state-of-the-art models on a novel numerical-temporal dataset, textbfTempUN, spanning from 10,000 BCE to 2100 CE.
arXiv Detail & Related papers (2024-02-19T09:43:03Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - LLaMA Beyond English: An Empirical Study on Language Capability Transfer [49.298360366468934]
We focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language.
We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer.
We employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench.
arXiv Detail & Related papers (2024-01-02T06:29:02Z) - Visual Self-paced Iterative Learning for Unsupervised Temporal Action Localization [50.48350210022611]
We present a novel self-paced iterative learning model to enhance clustering and localization training simultaneously.
We design two (constant- and variable- speed) incremental instance learning strategies for easy-to-hard model training, thus ensuring the reliability of these video pseudolabels.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - Online Continual Knowledge Learning for Language Models [3.654507524092343]
Large Language Models (LLMs) serve as repositories of extensive world knowledge, enabling them to perform tasks such as question-answering and fact-checking.
Online Continual Knowledge Learning (OCKL) aims to manage the dynamic nature of world knowledge in LMs under real-time constraints.
arXiv Detail & Related papers (2023-11-16T07:31:03Z) - Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models [74.81091933317882]
We introduce EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database.
We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge.
Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.
arXiv Detail & Related papers (2023-11-14T12:12:02Z) - Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - ALCUNA: Large Language Models Meet New Knowledge [48.30457202012987]
We propose an approach that generates new knowledge by altering existing entity attributes and relationships.
With KnowGen, we introduce a benchmark named ALCUNA to assess LLMs' abilities in knowledge understanding, differentiation, and association.
We also explore the impact of entity similarity on the model's understanding of entity knowledge and the influence of contextual entities.
arXiv Detail & Related papers (2023-10-23T11:40:05Z) - CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed.
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.