Related papers: TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables

TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables

URL: http://arxiv.org/abs/2504.01879v1
Date: Wed, 02 Apr 2025 16:34:43 GMT
Title: TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
Authors: Abhilash Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta,
Abstract summary: Large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning.<n>We present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods.
Score: 47.85408648193376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning. To assess the temporal reasoning capabilities of LLMs, we present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods. We introduce a template-based question-generation pipeline that harnesses LLMs to refine both templates and questions. Additionally, we establish baseline results using state-of-the-art LLMs to create a benchmark. We also introduce novel modeling strategies centered around task decomposition, enhancing LLM performance.

Related papers

LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics [56.99021951927683]
Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring.<n>Existing Large Language Models (LLMs) usually perform suboptimally because they neglect the inherent characteristics of time series data.<n>We propose LLM-PS to empower the LLM for TSF by learning the fundamental textitPatterns and meaningful textitSemantics from time series data.
arXiv Detail & Related papers (2025-03-12T11:45:11Z)
Position: Empowering Time Series Reasoning with Multimodal LLMs [49.73647759532127]
We argue that multimodal language models (MLLMs) can enable more powerful and flexible reasoning for time series analysis.<n>We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs.
arXiv Detail & Related papers (2025-02-03T16:10:48Z)
TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models [14.880203496664963]
Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification.<n>LLMs directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs.<n>We propose TableTime, which reformulates MTSC as a table understanding task.
arXiv Detail & Related papers (2024-11-24T07:02:32Z)
Enhancing Temporal Understanding in LLMs for Semi-structured Tables [50.59009084277447]
We conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of large language models (LLMs) Our investigation leads to enhancements in TempTabQA, a dataset specifically designed for temporal temporal question answering. We introduce a novel approach, C.L.E.A.R. to strengthen LLM capabilities in this domain.
arXiv Detail & Related papers (2024-07-22T20:13:10Z)
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning [20.066249913943405]
Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors. We introduce novel synthetic datasets specifically designed to assess LLM temporal reasoning abilities in various scenarios. Our findings provide valuable insights into the strengths and weaknesses of current LLMs in temporal reasoning tasks.
arXiv Detail & Related papers (2024-06-13T14:31:19Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion [24.545917737620197]
Temporal Knowledge Graph Completion (TKGC) is a complex task involving the prediction of missing event links at future timestamps. This paper aims to provide a comprehensive perspective on harnessing the advantages of Large Language Models for reasoning in temporal knowledge graphs.
arXiv Detail & Related papers (2024-01-11T17:42:47Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models [17.322480769274062]
Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks. This paper constructs Multiple Sensitive Factors Time QA (MenatQA) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs.
arXiv Detail & Related papers (2023-10-08T13:19:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.