ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
- URL: http://arxiv.org/abs/2001.08034v1
- Date: Wed, 22 Jan 2020 14:39:28 GMT
- Title: ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
- Authors: Darryl Hannan, Akshay Jain, and Mohit Bansal
- Abstract summary: We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
- Score: 73.93607719921945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new multimodal question answering challenge, ManyModalQA, in
which an agent must answer a question by considering three distinct modalities:
text, images, and tables. We collect our data by scraping Wikipedia and then
utilize crowdsourcing to collect question-answer pairs. Our questions are
ambiguous, in that the modality that contains the answer is not easily
determined based solely upon the question. To demonstrate this ambiguity, we
construct a modality selector (or disambiguator) network, and this model gets
substantially lower accuracy on our challenge set, compared to existing
datasets, indicating that our questions are more ambiguous. By analyzing this
model, we investigate which words in the question are indicative of the
modality. Next, we construct a simple baseline ManyModalQA model, which, based
on the prediction from the modality selector, fires a corresponding pre-trained
state-of-the-art unimodal QA model. We focus on providing the community with a
new manymodal evaluation set and only provide a fine-tuning set, with the
expectation that existing datasets and approaches will be transferred for most
of the training, to encourage low-resource generalization without large,
monolithic training sets for each new task. There is a significant gap between
our baseline models and human performance; therefore, we hope that this
challenge encourages research in end-to-end modality disambiguation and
multimodal QA models, as well as transfer learning. Code and data available at:
https://github.com/hannandarryl/ManyModalQA
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - MetaQA: Combining Expert Agents for Multi-Skill Question Answering [49.35261724460689]
We argue that despite the promising results of multi-dataset models, some domains or QA formats might require specific architectures.
We propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores.
arXiv Detail & Related papers (2021-12-03T14:05:52Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.