How Well Do Multi-hop Reading Comprehension Models Understand Date
Information?
- URL: http://arxiv.org/abs/2210.05208v1
- Date: Tue, 11 Oct 2022 07:24:07 GMT
- Title: How Well Do Multi-hop Reading Comprehension Models Understand Date
Information?
- Authors: Xanh Ho, Saku Sugawara, and Akiko Aizawa
- Abstract summary: The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
- Score: 31.243088887839257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several multi-hop reading comprehension datasets have been proposed to
resolve the issue of reasoning shortcuts by which questions can be answered
without performing multi-hop reasoning. However, the ability of multi-hop
models to perform step-by-step reasoning when finding an answer to a comparison
question remains unclear. It is also unclear how questions about the internal
reasoning process are useful for training and evaluating question-answering
(QA) systems. To evaluate the model precisely in a hierarchical manner, we
first propose a dataset, \textit{HieraDate}, with three probing tasks in
addition to the main question: extraction, reasoning, and robustness. Our
dataset is created by enhancing two previous multi-hop datasets, HotpotQA and
2WikiMultiHopQA, focusing on multi-hop questions on date information that
involve both comparison and numerical reasoning. We then evaluate the ability
of existing models to understand date information. Our experimental results
reveal that the multi-hop models do not have the ability to subtract two dates
even when they perform well in date comparison and number subtraction tasks.
Other results reveal that our probing questions can help to improve the
performance of the models (e.g., by +10.3 F1) on the main QA task and our
dataset can be used for data augmentation to improve the robustness of the
models.
Related papers
- MoreHopQA: More Than Multi-hop Reasoning [32.94332511203639]
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers.
Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue.
Our results show that models perform well on initial multi-hop questions but struggle with our extended questions.
arXiv Detail & Related papers (2024-06-19T09:38:59Z) - FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models [37.34801677290571]
FanOutQA is a high-quality dataset of fan-out question-answer pairs and human-annotated decompositions with English Wikipedia.
We formulate three benchmark settings across our dataset and benchmark 7 LLMs, including GPT-4, LLaMA 2, Claude-2.1, and Mixtral-8x7B.
arXiv Detail & Related papers (2024-02-21T20:30:45Z) - Analyzing the Effectiveness of the Underlying Reasoning Tasks in
Multi-hop Question Answering [28.809665884372183]
Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that (1) UR tasks can improve QA performance.
We find that (3) UR tasks do not contribute to improving the robustness of the model on adversarial questions, such as sub-questions and inverted questions.
arXiv Detail & Related papers (2023-02-12T17:32:55Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - A Dataset for Answering Time-Sensitive Questions [88.95075983560331]
Time is an important dimension in our physical world. Lots of facts can evolve with respect to time.
It is important to consider the time dimension and empower the existing QA models to reason over time.
The existing QA datasets contain rather few time-sensitive questions, hence not suitable for diagnosing or benchmarking the model's temporal reasoning capability.
arXiv Detail & Related papers (2021-08-13T16:42:25Z) - Question-Aware Memory Network for Multi-hop Question Answering in
Human-Robot Interaction [5.49601869466872]
We propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process.
We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering.
arXiv Detail & Related papers (2021-04-27T13:32:41Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
Reasoning Steps [31.472490306390977]
A multi-hop question answering dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question.
Previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question.
We present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data.
arXiv Detail & Related papers (2020-11-02T15:42:40Z) - Unsupervised Multi-hop Question Answering by Question Generation [108.61653629883753]
MQA-QG is an unsupervised framework that can generate human-like multi-hop training data.
Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance.
arXiv Detail & Related papers (2020-10-23T19:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.