Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2402.11997v2
- Date: Fri, 5 Jul 2024 11:26:51 GMT
- Title: Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models
- Authors: Himanshu Beniwal, Dishant Patel, Kowsik Nandagopan D, Hritik Ladia, Ankit Yadav, Mayank Singh,
- Abstract summary: Large Language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited.
Our study experiments with 12 state-of-the-art models on a novel numerical-temporal dataset, textbfTempUN, spanning from 10,000 BCE to 2100 CE.
- Score: 1.472789264981363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited, hindering their application in real-world scenarios where understanding the sequential nature of events is crucial. Our study experiments with 12 state-of-the-art models (ranging from 2B to 70B+ parameters) on a novel numerical-temporal dataset, \textbf{TempUN}, spanning from 10,000 BCE to 2100 CE, to uncover significant temporal retention and comprehension limitations. We propose six metrics to assess three learning paradigms to enhance temporal knowledge acquisition. Our findings reveal that open-source models exhibit knowledge gaps more frequently, suggesting a trade-off between limited knowledge and incorrect responses. Additionally, various fine-tuning approaches significantly improved performance, reducing incorrect outputs and impacting the identification of 'information not available' in the generations. The associated dataset and code are available at (https://github.com/lingoiitgn/TempUN).
Related papers
- CSTA: Spatial-Temporal Causal Adaptive Learning for Exemplar-Free Video Class-Incremental Learning [62.69917996026769]
A class-incremental learning task requires learning and preserving both spatial appearance and temporal action involvement.
We propose a framework that equips separate adapters to learn new class patterns, accommodating the incremental information requirements unique to each class.
A causal compensation mechanism is proposed to reduce the conflicts during increment and memorization for between different types of information.
arXiv Detail & Related papers (2025-01-13T11:34:55Z) - ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains [19.428141279030527]
Large language models (LLMs) have brought significant changes to many aspects of our lives.
Existing approaches fall short in addressing the temporal adaptability of knowledge.
We present ChroKnowledge, a novel sampling-based framework for evaluating LLMs' non-parametric chronological knowledge.
arXiv Detail & Related papers (2024-10-13T15:08:49Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time [0.0]
In real-world scenarios, the correctness of answers is frequently tied to temporal context.
We present a novel framework and dataset spanning over 8,000 events from 2018 to 2024.
Our work provides a significant step toward advancing time-aware language models.
arXiv Detail & Related papers (2024-09-20T08:57:20Z) - Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner [41.462673126500974]
Instance-incremental learning (IIL) focuses on learning continually with data of the same classes.
We propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge.
arXiv Detail & Related papers (2024-06-05T08:49:51Z) - Exploring the Limits of Historical Information for Temporal Knowledge
Graph Extrapolation [59.417443739208146]
We propose a new event forecasting model based on a novel training framework of historical contrastive learning.
CENET learns both the historical and non-historical dependency to distinguish the most potential entities.
We evaluate our proposed model on five benchmark graphs.
arXiv Detail & Related papers (2023-08-29T03:26:38Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.