Analyzing the Effectiveness of the Underlying Reasoning Tasks in
Multi-hop Question Answering
- URL: http://arxiv.org/abs/2302.05963v1
- Date: Sun, 12 Feb 2023 17:32:55 GMT
- Title: Analyzing the Effectiveness of the Underlying Reasoning Tasks in
Multi-hop Question Answering
- Authors: Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa
- Abstract summary: Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that (1) UR tasks can improve QA performance.
We find that (3) UR tasks do not contribute to improving the robustness of the model on adversarial questions, such as sub-questions and inverted questions.
- Score: 28.809665884372183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To explain the predicted answers and evaluate the reasoning abilities of
models, several studies have utilized underlying reasoning (UR) tasks in
multi-hop question answering (QA) datasets. However, it remains an open
question as to how effective UR tasks are for the QA task when training models
on both tasks in an end-to-end manner. In this study, we address this question
by analyzing the effectiveness of UR tasks (including both sentence-level and
entity-level tasks) in three aspects: (1) QA performance, (2) reasoning
shortcuts, and (3) robustness. While the previous models have not been
explicitly trained on an entity-level reasoning prediction task, we build a
multi-task model that performs three tasks together: sentence-level supporting
facts prediction, entity-level reasoning prediction, and answer prediction.
Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that
(1) UR tasks can improve QA performance. Using four debiased datasets that are
newly created, we demonstrate that (2) UR tasks are helpful in preventing
reasoning shortcuts in the multi-hop QA task. However, we find that (3) UR
tasks do not contribute to improving the robustness of the model on adversarial
questions, such as sub-questions and inverted questions. We encourage future
studies to investigate the effectiveness of entity-level reasoning in the form
of natural language questions (e.g., sub-question forms).
Related papers
- Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets [7.52684798377727]
We introduce Syn-(QA)$2$, a set of two synthetically generated question-answering (QA) datasets.
We find that false assumptions in QA are challenging, echoing the findings of prior work.
The detection task is more challenging with long-tail questions compared to naturally occurring questions.
arXiv Detail & Related papers (2024-03-18T18:01:26Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data
Augmentation [18.531941086922256]
Few-shot question answering (QA) aims at precisely discovering answers to a set of questions from context passages.
We develop Gotta, a Generative prOmpT-based daTa Augmentation framework.
Inspired by the human reasoning process, we propose to integrate the cloze task to enhance few-shot QA learning.
arXiv Detail & Related papers (2023-06-07T01:44:43Z) - Object-Centric Multi-Task Learning for Human Instances [8.035105819936808]
We explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learning.
We propose a novel query design to encode the human instance information effectively, called human-centric query (HCQ)
Experimental results show that the proposed multi-task network achieves comparable accuracy to state-of-the-art task-specific models.
arXiv Detail & Related papers (2023-03-13T01:10:50Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - How Well Do Multi-hop Reading Comprehension Models Understand Date
Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Dealing with Missing Modalities in the Visual Question Answer-Difference
Prediction Task through Knowledge Distillation [75.1682163844354]
We address the issues of missing modalities that have arisen from the Visual Question Answer-Difference prediction task.
We introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline.
arXiv Detail & Related papers (2021-04-13T06:41:11Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.