Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2402.11997v2
- Date: Fri, 5 Jul 2024 11:26:51 GMT
- Title: Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models
- Authors: Himanshu Beniwal, Dishant Patel, Kowsik Nandagopan D, Hritik Ladia, Ankit Yadav, Mayank Singh,
- Abstract summary: Large Language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited.
Our study experiments with 12 state-of-the-art models on a novel numerical-temporal dataset, textbfTempUN, spanning from 10,000 BCE to 2100 CE.
- Score: 1.472789264981363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly ubiquitous, yet their ability to retain and reason about temporal information remains limited, hindering their application in real-world scenarios where understanding the sequential nature of events is crucial. Our study experiments with 12 state-of-the-art models (ranging from 2B to 70B+ parameters) on a novel numerical-temporal dataset, \textbf{TempUN}, spanning from 10,000 BCE to 2100 CE, to uncover significant temporal retention and comprehension limitations. We propose six metrics to assess three learning paradigms to enhance temporal knowledge acquisition. Our findings reveal that open-source models exhibit knowledge gaps more frequently, suggesting a trade-off between limited knowledge and incorrect responses. Additionally, various fine-tuning approaches significantly improved performance, reducing incorrect outputs and impacting the identification of 'information not available' in the generations. The associated dataset and code are available at (https://github.com/lingoiitgn/TempUN).
Related papers
- ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains [19.428141279030527]
We present ChroKnowledge, a novel sampling-based framework for evaluating and updating large language models' non-parametric chronological knowledge.
Our framework successfully updates the overall knowledge across the entire timeline in both the biomedical domain and the general domain.
We perform a comprehensive analysis based on temporal characteristics of ChroKnowPrompt and validate the potential of various models to elicit intrinsic temporal knowledge.
arXiv Detail & Related papers (2024-10-13T15:08:49Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner [41.462673126500974]
Instance-incremental learning (IIL) focuses on learning continually with data of the same classes.
We propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge.
arXiv Detail & Related papers (2024-06-05T08:49:51Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Exploring the Limits of Historical Information for Temporal Knowledge
Graph Extrapolation [59.417443739208146]
We propose a new event forecasting model based on a novel training framework of historical contrastive learning.
CENET learns both the historical and non-historical dependency to distinguish the most potential entities.
We evaluate our proposed model on five benchmark graphs.
arXiv Detail & Related papers (2023-08-29T03:26:38Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized
Language Representations [35.51427298619691]
We study enhancing temporal knowledge embedding with textual data.
We propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA)
Experiments show that ECOLA significantly enhances temporal embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.
arXiv Detail & Related papers (2022-03-17T20:08:25Z) - Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference [22.648536283569747]
We propose models that leverage structured knowledge in different components of pre-trained models.
Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
arXiv Detail & Related papers (2021-09-08T21:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.