Related papers: LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models

LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models

URL: http://arxiv.org/abs/2511.12116v1
Date: Sat, 15 Nov 2025 09:08:10 GMT
Title: LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models
Authors: Piotr Pęzik, Konrad Kaczyński, Maria Szymańska, Filip Żarnecki, Zuzanna Deckert, Jakub Kwiatkowski, Wojciech Janowski,
Abstract summary: Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff.<n>LLMs may inadvertently blend outdated time-sensitive information with general knowledge during reasoning tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff. This creates a strict knowledge boundary beyond which models cannot provide accurate information without querying external sources. More subtly, when this limitation is unknown or ignored, LLMs may inadvertently blend outdated time-sensitive information with general knowledge during reasoning tasks, potentially compromising response accuracy. We introduce LLMLagBench, an LLM freshness benchmark, as a systematic approach for identifying the earliest probable temporal boundaries of an LLM's training data by evaluating its knowledge of recent events. We then apply this benchmark to evaluate a large set of LLMs, including models with both explicitly declared and undeclared training cutoffs. The reliability of the benchmark is assessed by manual validation and comparison with publicly released information about LLM pretraining.

Related papers

Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data [33.6173339938215]
Large language models (LLMs) are highly capable of answering questions, but they are often unaware of their own knowledge boundary.<n>Rather than hallucinating, a language model should be more honest and respond with "I don't know" when it does not have enough knowledge about a topic.
arXiv Detail & Related papers (2026-01-29T03:32:09Z)
Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction [15.45305246863211]
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning and prediction across different domains.<n>This paper presents a systematic study investigating whether LLMs can predict time intervals between recurring user actions.<n>We benchmark state-of-the-art LLMs in zero-shot settings against both statistical and machine-learning models.
arXiv Detail & Related papers (2026-01-15T07:18:40Z)
Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs [31.64130018833542]
Large Language Models (LLMs) are widely used for temporal prediction, but their reliance on pretraining data raises contamination concerns.<n>We investigate the capability of prompting to simulate earlier knowledge cutoff in LLMs.<n>Results demonstrate that while prompt-based simulated knowledge cutoffs show effectiveness when directly queried with the information after that date, they struggle to induce forgetting when the forgotten content is not directly asked but causally related to the query.
arXiv Detail & Related papers (2025-09-26T20:37:44Z)
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond [55.984684518346924]
We recast Knowledge Tracing as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable.<n>Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text.<n> Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories.
arXiv Detail & Related papers (2025-06-20T13:21:14Z)
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)<n>MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.<n>LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z)
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models.<n>We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z)
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models [41.67393607081513]
Large Language Models (LLMs) often struggle to accurately express the factual knowledge they possess.<n>We propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries.<n>We show that the proposed UAlign can significantly enhance the LLMs' capacities to confidently answer known questions.
arXiv Detail & Related papers (2024-12-16T14:14:27Z)
Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.<n>Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.<n>We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z)
Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge [55.65162959527848]
Large language models have shown excellent performance on many knowledge-intensive tasks. However, pretraining data tends to contain misleading and even conflicting information. This study systematically analyze LLMs' learning preferences for data with conflicting knowledge.
arXiv Detail & Related papers (2024-10-07T06:49:41Z)
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs) Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z)
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning [59.88924847995279]
We propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF.<n>To reduce the distribution discrepancy, we develop the cross-modal match module.<n>CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks.
arXiv Detail & Related papers (2024-03-12T04:04:38Z)
Temporal Blind Spots in Large Language Models [20.631107338678234]
Large language models (LLMs) have recently gained significant attention due to their unparalleled ability to perform various natural language processing tasks. This study investigates the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding.
arXiv Detail & Related papers (2024-01-22T16:20:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.