Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning
- URL: http://arxiv.org/abs/2311.09821v2
- Date: Fri, 12 Jul 2024 16:37:14 GMT
- Title: Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning
- Authors: Qingyu Tan, Hwee Tou Ng, Lidong Bing,
- Abstract summary: It is crucial for large language models (LLMs) to understand the concept of temporal knowledge.
We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
- Score: 73.51314109184197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins. Our code and data are released at: https://github.com/nusnlp/complex-tr.
Related papers
- Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models [21.579319926212296]
Large Language Models (LLMs) have emerged as powerful tools for generating coherent text, understanding context, and performing reasoning tasks.
They struggle with temporal reasoning, which requires processing time-related information such as event sequencing, durations, and inter-temporal relationships.
We introduce TISER, a novel framework that enhances the temporal reasoning abilities of LLMs through a multi-stage process that combines timeline construction with iterative self-reflection.
arXiv Detail & Related papers (2025-04-07T16:51:45Z) - Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data [22.274663165215237]
Time-series analysis is critical for a wide range of fields such as healthcare, finance, transportation, and energy.
Current time-series models are limited in their ability to perform reasoning that involves both time-series and their textual content.
Chat-TS integrates time-series tokens into LLMs' vocabulary, enhancing its reasoning ability over both modalities.
arXiv Detail & Related papers (2025-03-13T21:05:11Z) - Position: Empowering Time Series Reasoning with Multimodal LLMs [49.73647759532127]
We argue that multimodal language models (MLLMs) can enable more powerful and flexible reasoning for time series analysis.
We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs.
arXiv Detail & Related papers (2025-02-03T16:10:48Z) - Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability [19.722009684115434]
Retrieve-augmented generation (RAG) frameworks have emerged as a promising solution to multi-hop question answering(QA) tasks.
Existing RAG frameworks, which usually follows the retrieve-then-read paradigm, often struggle with multi-hop QA with temporal information.
This paper proposes a novel framework called review-then-refine, which aims to enhance LLM performance in multi-hop QA scenarios with temporal information.
arXiv Detail & Related papers (2024-12-19T17:48:23Z) - Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering [23.98067169669452]
Time-Sensitive Question Answering (TSQA) demands the effective utilization of specific temporal contexts.
We propose a novel framework that enhances temporal awareness and reasoning through Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning.
arXiv Detail & Related papers (2024-09-25T13:13:21Z) - Enhancing Temporal Understanding in LLMs for Semi-structured Tables [50.59009084277447]
We conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of large language models (LLMs)
Our investigation leads to enhancements in TempTabQA, a dataset specifically designed for temporal temporal question answering.
We introduce a novel approach, C.L.E.A.R. to strengthen LLM capabilities in this domain.
arXiv Detail & Related papers (2024-07-22T20:13:10Z) - Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? [70.19200858203388]
Temporal reasoning is fundamental for large language models to comprehend the world.
CoTempQA is a benchmark containing four co-temporal scenarios.
Our experiments reveal a significant gap between the performance of current LLMs and human-level reasoning.
arXiv Detail & Related papers (2024-06-13T12:56:21Z) - Self-Improvement Programming for Temporal Knowledge Graph Question Answering [31.33908040172437]
Temporal Knowledge Graph Question Answering (TKGQA) aims to answer questions with temporal intent over Temporal Knowledge Graphs (TKGs)
Existing end-to-end methods implicitly model the time constraints by learning time-aware embeddings of questions and candidate answers.
We introduce a novel self-improvement Programming method for TKGQA (Prog-TQA)
arXiv Detail & Related papers (2024-04-02T08:14:27Z) - Multi-hop Question Answering under Temporal Knowledge Editing [9.356343796845662]
Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models.
Existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts.
We propose TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA) to address this limitation.
arXiv Detail & Related papers (2024-03-30T23:22:51Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - Towards Benchmarking and Improving the Temporal Reasoning Capability of
Large Language Models [44.670550143705746]
We introduce a comprehensive probing dataset tempreason to evaluate the temporal reasoning capability of large language models.
Our dataset includes questions of three temporal reasoning levels.
We also propose a novel learning framework to improve the temporal reasoning capability of large language models.
arXiv Detail & Related papers (2023-06-15T08:44:41Z) - Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning Logic [84.59255070520673]
Large language models (LLMs) face a challenge when engaging in temporal reasoning.
We propose TempLogic, a novel framework designed specifically for temporal question-answering tasks.
arXiv Detail & Related papers (2023-05-24T10:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.