Challenges in Generalization in Open Domain Question Answering
- URL: http://arxiv.org/abs/2109.01156v1
- Date: Thu, 2 Sep 2021 18:04:10 GMT
- Title: Challenges in Generalization in Open Domain Question Answering
- Authors: Linqing Liu, Patrick Lewis, Sebastian Riedel, Pontus Stenetorp
- Abstract summary: We introduce and annotate questions according to three categories that measure different levels and kinds of generalization.
Key question difficulty factors are cascading errors from the retrieval component, frequency of question pattern, and frequency of the entity.
- Score: 16.63912089965166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on Open Domain Question Answering has shown that there is a large
discrepancy in model performance between novel test questions and those that
largely overlap with training questions. However, it is as of yet unclear which
aspects of novel questions that make them challenging. Drawing upon studies on
systematic generalization, we introduce and annotate questions according to
three categories that measure different levels and kinds of generalization:
training set overlap, compositional generalization (comp-gen), and novel entity
generalization (novel-entity). When evaluating six popular parametric and
non-parametric models, we find that for the established Natural Questions and
TriviaQA datasets, even the strongest model performance for
comp-gen/novel-entity is 13.1/5.4% and 9.6/1.5% lower compared to that for the
full test set -- indicating the challenge posed by these types of questions.
Furthermore, we show that whilst non-parametric models can handle questions
containing novel entities, they struggle with those requiring compositional
generalization. Through thorough analysis we find that key question difficulty
factors are: cascading errors from the retrieval component, frequency of
question pattern, and frequency of the entity.
Related papers
- Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity [0.0]
This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis.
We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality.
A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty.
arXiv Detail & Related papers (2024-08-23T05:40:35Z) - Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - "What makes a question inquisitive?" A Study on Type-Controlled
Inquisitive Question Generation [35.87102025753666]
We propose a type-controlled framework for inquisitive question generation.
We generate a variety of questions that adhere to specific types while drawing from the source texts.
We also investigate strategies for selecting a single question from a generated set.
arXiv Detail & Related papers (2022-05-17T02:05:50Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - Controllable Open-ended Question Generation with A New Question Type
Ontology [6.017006996402699]
We investigate the less-explored task of generating open-ended questions that are typically answered by multiple sentences.
We first define a new question type ontology which differentiates the nuanced nature of questions better than widely used question words.
We then propose a novel question type-aware question generation framework, augmented by a semantic graph representation.
arXiv Detail & Related papers (2021-07-01T00:02:03Z) - Learning with Instance Bundles for Reading Comprehension [61.823444215188296]
We introduce new supervision techniques that compare question-answer scores across multiple related instances.
Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers.
We empirically demonstrate the effectiveness of training with instance bundles on two datasets.
arXiv Detail & Related papers (2021-04-18T06:17:54Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.