Related papers: Question Answering under Temporal Conflict: Evaluating and Organizing Evolving Knowledge with LLMs

Question Answering under Temporal Conflict: Evaluating and Organizing Evolving Knowledge with LLMs

URL: http://arxiv.org/abs/2506.07270v1
Date: Sun, 08 Jun 2025 20:13:33 GMT
Title: Question Answering under Temporal Conflict: Evaluating and Organizing Evolving Knowledge with LLMs
Authors: Atahan Özer, Çağatay Yıldız,
Abstract summary: Large language models (LLMs) exhibit remarkable capabilities in question answering and reasoning.<n> Updating this knowledge typically requires costly and brittle re-training.<n>We propose a lightweight, agentic framework that incrementally builds a structured, external memory from source documents without requiring re-training.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) exhibit remarkable capabilities in question answering and reasoning thanks to their extensive parametric memory. However, their knowledge is inherently limited by the scope of their pre-training data, while real-world information evolves continuously. Updating this knowledge typically requires costly and brittle re-training, or in-context learning (ICL), which becomes impractical at scale given the volume and volatility of modern information. Motivated by these limitations, we investigate how LLMs perform when exposed to temporal text corpora, or documents that reflect evolving knowledge over time, such as sports biographies where facts like a player's "current team" change year by year. To this end, we introduce two new benchmarks: Temporal Wiki, which captures factual drift across historical Wikipedia snapshots, and Unified Clark, which aggregates timestamped news articles to simulate real-world information accumulation. Our analysis reveals that LLMs often struggle to reconcile conflicting or outdated facts and can be misled when multiple versions of a fact appear in context. To address these issues, we propose a lightweight, agentic framework that incrementally builds a structured, external memory from source documents without requiring re-training. This knowledge organization strategy enables models to retrieve and reason over temporally filtered, relevant information at inference time. Empirically, our method outperforms ICL and RAG baselines across both benchmarks, especially on questions requiring more complex reasoning or integration of conflicting facts.

Related papers

KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing [64.38243807002878]
We present KnowTrace, an elegant RAG framework to mitigate the context overload in large language models.<n>KnowTrace autonomously traces out desired knowledge triplets to organize a specific knowledge graph relevant to the input question.<n>It consistently surpasses existing methods across three multi-hop question answering benchmarks.
arXiv Detail & Related papers (2025-05-26T17:22:20Z)
Training Plug-n-Play Knowledge Modules with Deep Context Distillation [52.94830874557649]
In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs)<n> KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents.<n>Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets.
arXiv Detail & Related papers (2025-03-11T01:07:57Z)
LLMs as Repositories of Factual Knowledge: Limitations and Solutions [1.7764955091415962]
We study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge.<n>We evaluate their reliability in responding to time-sensitive factual questions.<n>We propose "ENtity-Aware Fine-tuning" (ENAF) to improve the model's performance.
arXiv Detail & Related papers (2025-01-22T10:16:53Z)
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains [19.428141279030527]
ChroKnowBench is a benchmark dataset designed to evaluate chronologically accumulated knowledge.<n>ChroKnowledge is a novel sampling-based framework for evaluating LLMs' non-parametric chronological knowledge.<n>ChroKnowPrompt is an in-depth prompting to elicit chronological knowledge by traversing step-by-step through the surrounding time spans.
arXiv Detail & Related papers (2024-10-13T15:08:49Z)
Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge. We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z)
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs [1.7764955091415962]
We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata. We evaluate the time-sensitive knowledge in twenty-four private and open-source LLMs, as well as the effectiveness of four editing methods in updating the outdated facts. Our results show that 1) outdatedness is a critical problem across state-of-the-art LLMs; 2) LLMs output inconsistent answers when prompted with slight variations of the question prompt; and 3) the performance of the state-of-the-art knowledge editing algorithms is very limited.
arXiv Detail & Related papers (2024-04-10T18:08:59Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities. If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown. We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.