Related papers: Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?

Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?

URL: http://arxiv.org/abs/2510.15513v1
Date: Fri, 17 Oct 2025 10:33:48 GMT
Title: Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?
Authors: Ashutosh Bajpai, Tanmoy Chakraborty,
Abstract summary: Large language models (LLMs) are an alternative to knowledge sources.<n>LLMs must be factually accurate and demonstrate consistency across temporal dimensions.<n>Despite this critical requirement, efforts to ensure temporal consistency in LLMs remain scarce.
Score: 21.90468150326666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing acceptance of large language models (LLMs) as an alternative to knowledge sources marks a significant paradigm shift across various domains, including time-sensitive fields such as law, healthcare, and finance. To fulfill this expanded role, LLMs must not only be factually accurate but also demonstrate consistency across temporal dimensions, necessitating robust temporal reasoning capabilities. Despite this critical requirement, efforts to ensure temporal consistency in LLMs remain scarce including noticeable absence of endeavors aimed at evaluating or augmenting LLMs across temporal references in time-sensitive inquiries. In this paper, we seek to address this gap by introducing a novel benchmark entitled temporal referential consistency, accompanied by a resource TEMP-ReCon designed to benchmark a wide range of both open-source and closed-source LLMs with various linguistic contexts characterized by differing resource richness (including English, French, and Romanian). The findings emphasis that LLMs do exhibit insufficient temporal referent consistency. To address this, we propose \newmodel, a reasoning path alignment-based model that aims to enhance the temporal referential consistency of LLMs. Our empirical experiments substantiate the efficacy of UnTRaP compared to several baseline models.

Related papers

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation [110.610512800947]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing studies often treat utility as a generic attribute, ignoring the fact that different LLMs may benefit differently from the same passage.
arXiv Detail & Related papers (2025-10-13T12:57:45Z)
Semantic-Enhanced Time-Series Forecasting via Large Language Models [20.383296465541758]
Time series forecasting plays a significant role in finance, energy, meteorology, and IoT applications.<n>Recent studies have leveraged the generalization capabilities of large language models (LLMs) to adapt to time series forecasting, achieving promising performance.<n>We propose a novel Semantic-Enhanced LLM (SE-LLM) that explores the inherent periodicity and anomalous characteristics of time series to embed into the semantic space.
arXiv Detail & Related papers (2025-08-11T07:19:21Z)
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs [63.580867975515474]
We present the first systematic investigation comparing the long-context performance of diffusion LLMs and traditional auto-regressive LLMs.<n>We propose LongLLaDA, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation.
arXiv Detail & Related papers (2025-06-17T11:45:37Z)
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning [30.308743810639758]
Large audio language models (LALMs) have to be evaluated on reasoning related tasks which are different from traditional classification or generation tasks.<n>We benchmark open-source LALMs and observe that they are consistently behind human capabilities on the tasks in the TREA dataset.<n>Our analysis shows that the accuracy and uncertainty metrics are not necessarily correlated and thus, points to a need for wholesome evaluation of LALMs for high-stakes applications.
arXiv Detail & Related papers (2025-05-19T13:46:35Z)
LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics [56.99021951927683]
Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring.<n>Existing Large Language Models (LLMs) usually perform suboptimally because they neglect the inherent characteristics of time series data.<n>We propose LLM-PS to empower the LLM for TSF by learning the fundamental textitPatterns and meaningful textitSemantics from time series data.
arXiv Detail & Related papers (2025-03-12T11:45:11Z)
Position: Empowering Time Series Reasoning with Multimodal LLMs [49.73647759532127]
We argue that multimodal language models (MLLMs) can enable more powerful and flexible reasoning for time series analysis.<n>We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs.
arXiv Detail & Related papers (2025-02-03T16:10:48Z)
Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages [19.863010475923414]
The disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs)<n>Recent strides in cross-lingual in-context learning (X-ICL), mainly through semantically aligned examples retrieved from multilingual pre-trained transformers, have shown promise in mitigating this issue.<n>This study aims to bridge this gap by improving temporal reasoning capabilities in low-resource languages.
arXiv Detail & Related papers (2024-12-11T04:16:39Z)
Temporally Consistent Factuality Probing for Large Language Models [16.177991267568125]
We introduce TeCFaP, a novel Temporally Consistent Factuality Probe task. We extend the definitions of existing metrics to represent consistent factuality across temporal dimension. Next, we propose a novel solution CoTSeLF combining multi-task instruction tuning (MT-IT) with consistent-time-sensitive reinforcement learning (CTSRL) to improve temporally consistent factuality in LLMs.
arXiv Detail & Related papers (2024-09-21T08:41:08Z)
Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? [70.19200858203388]
Temporal reasoning is fundamental for large language models to comprehend the world. CoTempQA is a benchmark containing four co-temporal scenarios. Our experiments reveal a significant gap between the performance of current LLMs and human-level reasoning.
arXiv Detail & Related papers (2024-06-13T12:56:21Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)
TRAM: Benchmarking Temporal Reasoning for Large Language Models [12.112914393948415]
We introduce TRAM, a temporal reasoning benchmark composed of ten datasets. We evaluate popular language models like GPT-4 and Llama2 in zero-shot and few-shot scenarios. Our findings indicate that the best-performing model lags significantly behind human performance.
arXiv Detail & Related papers (2023-10-02T00:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.