A Dataset for Answering Time-Sensitive Questions
- URL: http://arxiv.org/abs/2108.06314v1
- Date: Fri, 13 Aug 2021 16:42:25 GMT
- Title: A Dataset for Answering Time-Sensitive Questions
- Authors: Wenhu Chen, Xinyi Wang, William Yang Wang
- Abstract summary: Time is an important dimension in our physical world. Lots of facts can evolve with respect to time.
It is important to consider the time dimension and empower the existing QA models to reason over time.
The existing QA datasets contain rather few time-sensitive questions, hence not suitable for diagnosing or benchmarking the model's temporal reasoning capability.
- Score: 88.95075983560331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time is an important dimension in our physical world. Lots of facts can
evolve with respect to time. For example, the U.S. President might change every
four years. Therefore, it is important to consider the time dimension and
empower the existing QA models to reason over time. However, the existing QA
datasets contain rather few time-sensitive questions, hence not suitable for
diagnosing or benchmarking the model's temporal reasoning capability. In order
to promote research in this direction, we propose to construct a time-sensitive
QA dataset. The dataset is constructed by 1) mining time-evolving facts from
WikiData and align them to their corresponding Wikipedia page, 2) employing
crowd workers to verify and calibrate these noisy facts, 3) generating
question-answer pairs based on the annotated time-sensitive facts. Our dataset
poses two novel challenges: 1) the model needs to understand both explicit and
implicit mention of time information in the long document, 2) the model needs
to perform temporal reasoning like comparison, addition, subtraction. We
evaluate different SoTA long-document QA systems like BigBird and FiD on our
dataset. The best-performing model FiD can only achieve 46\% accuracy, still
far behind the human performance of 87\%. We demonstrate that these models are
still lacking the ability to perform robust temporal understanding and
reasoning. Therefore, we believe that our dataset could serve as a benchmark to
empower future studies in temporal reasoning. The dataset and code are released
in~\url{https://github.com/wenhuchen/Time-Sensitive-QA}.
Related papers
- Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge.
We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z) - Time-Aware Representation Learning for Time-Sensitive Question Answering [19.822549681087107]
We propose a Time-Context aware Question Answering (TCQA) framework.
We build a time-context dependent data generation framework for model training.
We present a metric to evaluate the time awareness of the QA model.
arXiv Detail & Related papers (2023-10-19T08:48:45Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Towards Benchmarking and Improving the Temporal Reasoning Capability of
Large Language Models [44.670550143705746]
We introduce a comprehensive probing dataset tempreason to evaluate the temporal reasoning capability of large language models.
Our dataset includes questions of three temporal reasoning levels.
We also propose a novel learning framework to improve the temporal reasoning capability of large language models.
arXiv Detail & Related papers (2023-06-15T08:44:41Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - How Well Do Multi-hop Reading Comprehension Models Understand Date
Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - ForecastTKGQuestions: A Benchmark for Temporal Question Answering and
Forecasting over Temporal Knowledge Graphs [28.434829347176233]
Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest.
TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases.
We propose a novel task: forecasting question answering over temporal knowledge graphs.
arXiv Detail & Related papers (2022-08-12T21:02:35Z) - SituatedQA: Incorporating Extra-Linguistic Contexts into QA [7.495151447459443]
We introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context.
We find that a significant proportion of information seeking questions have context-dependent answers.
Our study shows that existing models struggle with producing answers that are frequently updated or from uncommon locations.
arXiv Detail & Related papers (2021-09-13T17:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.