Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems
- URL: http://arxiv.org/abs/2406.19538v1
- Date: Thu, 27 Jun 2024 21:31:30 GMT
- Title: Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems
- Authors: Dan Schumacher, Fatemeh Haji, Tara Grey, Niharika Bandlamudi, Nupoor Karnik, Gagana Uday Kumar, Jason Cho-Yu Chiang, Paul Rad, Nishant Vishwamitra, Anthony Rios,
- Abstract summary: This paper empirically examines the robustness of temporal question-answering systems trained on various context types.
We show that training with a mix of these contexts enhances model robustness and accuracy.
We introduce two new context-rich TQA datasets, ContextAQA and ContextTQE, and provide comprehensive evaluations and guidelines for training robust TQA models.
- Score: 7.393290178125003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) often struggle with temporal reasoning, crucial for tasks like historical event analysis and time-sensitive information retrieval. Despite advancements, state-of-the-art models falter in handling temporal information, especially when faced with irrelevant or noisy contexts. This paper addresses this gap by empirically examining the robustness of temporal question-answering (TQA) systems trained on various context types, including relevant, irrelevant, slightly altered, and no context. Our findings indicate that training with a mix of these contexts enhances model robustness and accuracy. Additionally, we show that the position of context relative to the question significantly impacts performance, with question-first positioning yielding better results. We introduce two new context-rich TQA datasets, ContextAQA and ContextTQE, and provide comprehensive evaluations and guidelines for training robust TQA models. Our work lays the foundation for developing reliable and context-aware temporal QA systems, with broader implications for enhancing LLM robustness against diverse and potentially adversarial information.
Related papers
- UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization [34.257914212541394]
UnSeenTimeQA is a novel time-sensitive question-answering benchmark.
It diverges from traditional TSQA benchmarks by avoiding factual and web-searchable queries.
It requires large language models to engage in genuine temporal reasoning.
arXiv Detail & Related papers (2024-07-03T22:02:07Z) - Synthetic Context Generation for Question Generation [6.226609932118123]
This paper investigates training QG models using synthetic contexts generated by large language models.
We find that contexts are essential for QG tasks, even if they are synthetic.
arXiv Detail & Related papers (2024-06-19T03:37:52Z) - Limited Out-of-Context Knowledge Reasoning in Large Language Models [65.72847298578071]
Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities.
This paper focuses on a significant facet of out-of-context reasoning: Out-of-context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Consistency Training by Synthetic Question Generation for Conversational Question Answering [14.211024633768986]
We augment historical information with synthetic questions to make the reasoning robust to irrelevant history.
This is the first instance of research using question generation as a form of data augmentation to model conversational QA settings.
arXiv Detail & Related papers (2024-04-17T06:49:14Z) - Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge.
We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z) - Learning to Filter Context for Retrieval-Augmented Generation [75.18946584853316]
Generation models are required to generate outputs given partially or entirely irrelevant passages.
FILCO identifies useful context based on lexical and information-theoretic approaches.
It trains context filtering models that can filter retrieved contexts at test time.
arXiv Detail & Related papers (2023-11-14T18:41:54Z) - Lost in the Middle: How Language Models Use Long Contexts [88.78803442320246]
We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts.
We find that performance can degrade significantly when changing the position of relevant information.
Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
arXiv Detail & Related papers (2023-07-06T17:54:11Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Sorting through the noise: Testing robustness of information processing
in pre-trained language models [5.371816551086117]
This paper examines robustness of models' ability to deploy relevant context information in the face of distracting content.
We find that although models appear in simple contexts to make predictions based on understanding and applying relevant facts from prior context, the presence of distracting but irrelevant content has clear impact in confusing model predictions.
arXiv Detail & Related papers (2021-09-25T16:02:23Z) - SituatedQA: Incorporating Extra-Linguistic Contexts into QA [7.495151447459443]
We introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context.
We find that a significant proportion of information seeking questions have context-dependent answers.
Our study shows that existing models struggle with producing answers that are frequently updated or from uncommon locations.
arXiv Detail & Related papers (2021-09-13T17:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.