Robust Question Answering Through Sub-part Alignment
- URL: http://arxiv.org/abs/2004.14648v3
- Date: Mon, 19 Apr 2021 20:43:55 GMT
- Title: Robust Question Answering Through Sub-part Alignment
- Authors: Jifan Chen and Greg Durrett
- Abstract summary: We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
- Score: 53.94003466761305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current textual question answering models achieve strong performance on
in-domain test sets, but often do so by fitting surface-level patterns in the
data, so they fail to generalize to out-of-distribution settings. To make a
more robust and understandable QA system, we model question answering as an
alignment problem. We decompose both the question and context into smaller
units based on off-the-shelf semantic representations (here, semantic roles),
and align the question to a subgraph of the context in order to find the
answer. We formulate our model as a structured SVM, with alignment scores
computed via BERT, and we can train end-to-end despite using beam search for
approximate inference. Our explicit use of alignments allows us to explore a
set of constraints with which we can prohibit certain types of bad model
behavior arising in cross-domain settings. Furthermore, by investigating
differences in scores across different potential answers, we can seek to
understand what particular aspects of the input lead the model to choose the
answer without relying on post-hoc explanation techniques. We train our model
on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
The results show that our model is more robust cross-domain than the standard
BERT QA model, and constraints derived from alignment scores allow us to
effectively trade off coverage and accuracy.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks.
Recent work focuses on customized methods, limiting the model transferability and scalability.
We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z) - Beyond Accuracy: A Consolidated Tool for Visual Question Answering
Benchmarking [30.155625852894797]
We propose a browser-based benchmarking tool for researchers and challenge organizers.
Our tool helps test generalization capabilities of models across multiple datasets.
Interactive filtering facilitates discovery of problematic behavior.
arXiv Detail & Related papers (2021-10-11T11:08:35Z) - Answering Ambiguous Questions through Generative Evidence Fusion and
Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions.
Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - Selective Question Answering under Domain Shift [90.021577320085]
Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs.
We train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely.
Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.
arXiv Detail & Related papers (2020-06-16T19:13:21Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.