PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering
- URL: http://arxiv.org/abs/2402.11034v2
- Date: Mon, 3 Jun 2024 18:36:59 GMT
- Title: PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering
- Authors: Jannat Ara Meem, Muhammad Shihab Rashid, Yue Dong, Vagelis Hristidis,
- Abstract summary: We introduce the PAT-Questions benchmark, which includes single and multi-hop temporal questions.
The answers in PAT-Questions can be automatically refreshed by re-running SPARQL queries on a knowledge graph, if available.
We evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model (TEMPREASON-T5) on PAT-Questions through direct prompting and retrieval-augmented generation (RAG)
- Score: 6.109188517569139
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Existing work on Temporal Question Answering (TQA) has predominantly focused on questions anchored to specific timestamps or events (e.g. "Who was the US president in 1970?"). Little work has studied questions whose temporal context is relative to the present time (e.g. "Who was the previous US president?"). We refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses unique challenges: (1) large language models (LLMs) may have outdated knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are hard to reason, (3) multi-hop reasoning may be required, and (4) the gold answers of benchmarks must be continuously updated. To address these challenges, we introduce the PAT-Questions benchmark, which includes single and multi-hop temporal questions. The answers in PAT-Questions can be automatically refreshed by re-running SPARQL queries on a knowledge graph, if available. We evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model (TEMPREASON-T5) on PAT-Questions through direct prompting and retrieval-augmented generation (RAG). The results highlight the limitations of existing solutions in PATQA and motivate the need for new methods to improve PATQA reasoning capabilities.
Related papers
- Multi-hop Question Answering under Temporal Knowledge Editing [9.356343796845662]
Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models.
Existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts.
We propose TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA) to address this limitation.
arXiv Detail & Related papers (2024-03-30T23:22:51Z) - Event Extraction as Question Generation and Answering [72.04433206754489]
Recent work on Event Extraction has reframed the task as Question Answering (QA)
We propose QGA-EE, which enables a Question Generation (QG) model to generate questions that incorporate rich contextual information instead of using fixed templates.
Experiments show that QGA-EE outperforms all prior single-task-based models on the ACE05 English dataset.
arXiv Detail & Related papers (2023-07-10T01:46:15Z) - IfQA: A Dataset for Open-domain Question Answering under Counterfactual
Presuppositions [54.23087908182134]
We introduce the first large-scale counterfactual open-domain question-answering (QA) benchmarks, named IfQA.
The IfQA dataset contains over 3,800 questions that were annotated by crowdworkers on relevant Wikipedia passages.
The unique challenges posed by the IfQA benchmark will push open-domain QA research on both retrieval and counterfactual reasoning fronts.
arXiv Detail & Related papers (2023-05-23T12:43:19Z) - ForecastTKGQuestions: A Benchmark for Temporal Question Answering and
Forecasting over Temporal Knowledge Graphs [28.434829347176233]
Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest.
TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases.
We propose a novel task: forecasting question answering over temporal knowledge graphs.
arXiv Detail & Related papers (2022-08-12T21:02:35Z) - RealTime QA: What's the Answer Right Now? [137.04039209995932]
We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis.
We build strong baseline models upon large pretrained language models, including GPT-3 and T5.
GPT-3 tends to return outdated answers when retrieved documents do not provide sufficient information to find an answer.
arXiv Detail & Related papers (2022-07-27T07:26:01Z) - Improving Time Sensitivity for Question Answering over Temporal
Knowledge Graphs [13.906994055281826]
We propose a time-sensitive question answering (TSQA) framework to tackle these problems.
TSQA features a timestamp estimation module to infer the unwritten timestamp from the question.
We also employ a time-sensitive KG encoder to inject ordering information into the temporal KG embeddings that TSQA is based on.
arXiv Detail & Related papers (2022-03-01T06:21:14Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - Complex Temporal Question Answering on Knowledge Graphs [22.996399822102575]
This work presents EXAQT, the first end-to-end system for answering complex temporal questions.
It answers natural language questions over knowledge graphs (KGs) in two stages, one geared towards high recall, the other towards precision at top ranks.
We evaluate EXAQT on TimeQuestions, a large dataset of 16k temporal questions compiled from a variety of general purpose KG-QA benchmarks.
arXiv Detail & Related papers (2021-09-18T13:41:43Z) - NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions [80.60423934589515]
We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark.
We set up multi-choice and open-ended QA tasks targeting causal action reasoning, temporal action reasoning, and common scene comprehension.
We find that top-performing methods excel at shallow scene descriptions but are weak in causal and temporal action reasoning.
arXiv Detail & Related papers (2021-05-18T04:56:46Z) - PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them [70.09741980324912]
Open-domain Question Answering models which directly leverage question-answer (QA) pairs show promise in terms of speed and memory.
We introduce a new QA-pair retriever, RePAQ, to complement PAQ.
We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models.
arXiv Detail & Related papers (2021-02-13T23:43:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.