Related papers: Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems

Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems

URL: http://arxiv.org/abs/2406.19538v1
Date: Thu, 27 Jun 2024 21:31:30 GMT
Title: Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems
Authors: Dan Schumacher, Fatemeh Haji, Tara Grey, Niharika Bandlamudi, Nupoor Karnik, Gagana Uday Kumar, Jason Cho-Yu Chiang, Paul Rad, Nishant Vishwamitra, Anthony Rios,
Abstract summary: This paper empirically examines the robustness of temporal question-answering systems trained on various context types. We show that training with a mix of these contexts enhances model robustness and accuracy. We introduce two new context-rich TQA datasets, ContextAQA and ContextTQE, and provide comprehensive evaluations and guidelines for training robust TQA models.
Score: 7.393290178125003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often struggle with temporal reasoning, crucial for tasks like historical event analysis and time-sensitive information retrieval. Despite advancements, state-of-the-art models falter in handling temporal information, especially when faced with irrelevant or noisy contexts. This paper addresses this gap by empirically examining the robustness of temporal question-answering (TQA) systems trained on various context types, including relevant, irrelevant, slightly altered, and no context. Our findings indicate that training with a mix of these contexts enhances model robustness and accuracy. Additionally, we show that the position of context relative to the question significantly impacts performance, with question-first positioning yielding better results. We introduce two new context-rich TQA datasets, ContextAQA and ContextTQE, and provide comprehensive evaluations and guidelines for training robust TQA models. Our work lays the foundation for developing reliable and context-aware temporal QA systems, with broader implications for enhancing LLM robustness against diverse and potentially adversarial information.

Related papers

It's High Time: A Survey of Temporal Question Answering [17.07150094603319]
Temporal Question Answering (TQA) focuses on answering questions involving temporal constraints or context.<n>Recent advances in TQA enabled by neural models and Large Language Models (LLMs)<n> benchmark datasets and evaluation strategies designed to test temporal robustness, recency awareness, and generalization.
arXiv Detail & Related papers (2025-05-26T17:21:26Z)
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models [21.579319926212296]
Large Language Models (LLMs) have emerged as powerful tools for generating coherent text, understanding context, and performing reasoning tasks. They struggle with temporal reasoning, which requires processing time-related information such as event sequencing, durations, and inter-temporal relationships. We introduce TISER, a novel framework that enhances the temporal reasoning abilities of LLMs through a multi-stage process that combines timeline construction with iterative self-reflection.
arXiv Detail & Related papers (2025-04-07T16:51:45Z)
MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering [21.064096256892686]
Multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering. We introduce Multimodal Time Series Benchmark (MTBench), a benchmark to evaluate large language models (LLMs) on time series and text understanding. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns.
arXiv Detail & Related papers (2025-03-21T05:04:53Z)
Mitigating Knowledge Conflicts in Language Model-Driven Question Answering [15.29366851382021]
Two fundamental knowledge sources play crucial roles in document-based question answering and document summarization systems. Recent studies revealed a significant challenge: when there exists a misalignment between the model's inherent knowledge and the ground truth answers in training data, the system may exhibit problematic behaviors during inference. Our investigation proposes a strategy to minimize hallucination by building explicit connection between source inputs and generated outputs.
arXiv Detail & Related papers (2024-11-18T07:33:10Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning. We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities [18.859309032300402]
We investigate how the integration of information from image and text modalities influences the performance and behavior of Visual Language Model (VLM) predictions. We study the interplay between text and image modalities in different configurations where visual content is essential for solving the VQA task. Our results show that complementary information between modalities improves answer and reasoning quality, while contradictory information harms model performance and confidence.
arXiv Detail & Related papers (2024-10-02T16:02:02Z)
Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering [23.98067169669452]
Time-Sensitive Question Answering (TSQA) demands the effective utilization of specific temporal contexts. We propose a novel framework that enhances temporal awareness and reasoning through Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning.
arXiv Detail & Related papers (2024-09-25T13:13:21Z)
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems [3.486120902611884]
This paper explores the significance of different question types for VQA systems and their impact on performance. We propose QTG-VQA, a novel architecture that incorporates question-type-guided attention and adaptive learning mechanism.
arXiv Detail & Related papers (2024-09-14T07:42:41Z)
QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [66.01597794579568]
We introduce information bottleneck theory (IB) to model the problem. We propose a cross-attention-based approach to approximate mutual information in IB. Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z)
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning [5.053086684547045]
This study introduces an in-context learning-based approach to enhance the reasoning capabilities of RALMs. Our approach increases accuracy in identifying unanswerable and conflicting scenarios without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-08-08T12:42:43Z)
Synthetic Context Generation for Question Generation [6.226609932118123]
This paper investigates training QG models using synthetic contexts generated by large language models. We find that contexts are essential for QG tasks, even if they are synthetic.
arXiv Detail & Related papers (2024-06-19T03:37:52Z)
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge. We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z)
Lost in the Middle: How Language Models Use Long Contexts [88.78803442320246]
We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts. We find that performance can degrade significantly when changing the position of relevant information. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
arXiv Detail & Related papers (2023-07-06T17:54:11Z)
Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning Logic [84.59255070520673]
Large language models (LLMs) face a challenge when engaging in temporal reasoning. We propose TempLogic, a novel framework designed specifically for temporal question-answering tasks.
arXiv Detail & Related papers (2023-05-24T10:57:53Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data. We formalize the relevant causal structure of problems such as dynamic personalized pricing. We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.