Large Language Models Can Learn Temporal Reasoning
- URL: http://arxiv.org/abs/2401.06853v6
- Date: Tue, 08 Oct 2024 15:55:37 GMT
- Title: Large Language Models Can Learn Temporal Reasoning
- Authors: Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri,
- Abstract summary: We propose TG-LLM, a novel framework towards language-based temporal reasoning.
Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG)
A synthetic dataset (TGQA) is fully controllable and requires minimal supervision.
- Score: 11.599570446840547
- License:
- Abstract: While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal concepts and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that enhances the learning of TR. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain-of-Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.
Related papers
- Temporally Consistent Factuality Probing for Large Language Models [16.177991267568125]
We introduce TeCFaP, a novel Temporally Consistent Factuality Probe task.
We extend the definitions of existing metrics to represent consistent factuality across temporal dimension.
Next, we propose a novel solution CoTSeLF combining multi-task instruction tuning (MT-IT) with consistent-time-sensitive reinforcement learning (CTSRL) to improve temporally consistent factuality in LLMs.
arXiv Detail & Related papers (2024-09-21T08:41:08Z) - Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages.
We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT)
Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z) - LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models [5.455744338342196]
temporal reasoning (TR) is a critical component of artificial intelligence.
Various datasets have been constructed in different ways for evaluating various aspects of TR ability.
Our work proposes a novel approach to design and develop a pipeline for constructing datasets to evaluate the TR ability of LLMs.
arXiv Detail & Related papers (2024-07-07T16:37:06Z) - Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning [87.10396098919013]
Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning.
We propose a Large Language Models-guided Dynamic Adaptation (LLM-DA) method for reasoning on Temporal Knowledge Graphs.
LLM-DA harnesses the capabilities of LLMs to analyze historical data and extract temporal logical rules.
arXiv Detail & Related papers (2024-05-23T04:54:37Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT)
We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - TRAM: Benchmarking Temporal Reasoning for Large Language Models [12.112914393948415]
We introduce TRAM, a temporal reasoning benchmark composed of ten datasets.
We evaluate popular language models like GPT-4 and Llama2 in zero-shot and few-shot scenarios.
Our findings indicate that the best-performing model lags significantly behind human performance.
arXiv Detail & Related papers (2023-10-02T00:59:07Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.