Select, Substitute, Search: A New Benchmark for Knowledge-Augmented
Visual Question Answering
- URL: http://arxiv.org/abs/2103.05568v1
- Date: Tue, 9 Mar 2021 17:19:50 GMT
- Title: Select, Substitute, Search: A New Benchmark for Knowledge-Augmented
Visual Question Answering
- Authors: Aman Jain, Mayank Kothyari, Vishwajeet Kumar, Preethi Jyothi, Ganesh
Ramakrishnan, Soumen Chakrabarti
- Abstract summary: Multimodal IR, spanning text corpus, knowledge graph and images, is of much recent interest.
A surprisingly large fraction of queries do not assess the ability to integrate cross-modal information.
We build a new data set and challenge around a key structural idiom in OKVQA,viz., S3.
- Score: 35.855792706139525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal IR, spanning text corpus, knowledge graph and images, called
outside knowledge visual question answering (OKVQA), is of much recent
interest. However, the popular data set has serious limitations. A surprisingly
large fraction of queries do not assess the ability to integrate cross-modal
information. Instead, some are independent of the image, some depend on
speculation, some require OCR or are otherwise answerable from the image alone.
To add to the above limitations, frequency-based guessing is very effective
because of (unintended) widespread answer overlaps between the train and test
folds. Overall, it is hard to determine when state-of-the-art systems exploit
these weaknesses rather than really infer the answers, because they are opaque
and their 'reasoning' process is uninterpretable. An equally important
limitation is that the dataset is designed for the quantitative assessment only
of the end-to-end answer retrieval task, with no provision for assessing the
correct(semantic) interpretation of the input query. In response, we identify a
key structural idiom in OKVQA ,viz., S3 (select, substitute and search), and
build a new data set and challenge around it. Specifically, the questioner
identifies an entity in the image and asks a question involving that entity
which can be answered only by consulting a knowledge graph or corpus passage
mentioning the entity. Our challenge consists of (i)OKVQAS3, a subset of OKVQA
annotated based on the structural idiom and (ii)S3VQA, a new dataset built from
scratch. We also present a neural but structurally transparent OKVQA system,
S3, that explicitly addresses our challenge dataset, and outperforms recent
competitive baselines.
Related papers
- Convincing Rationales for Visual Question Answering Reasoning [14.490692389105947]
Visual Question Answering (VQA) is a challenging task of predicting the answer to a question about the content of an image.
To generate both visual and textual rationales next to the predicted answer to a given image/question pair, we propose Convincing Rationales for VQA, CRVQA.
CRVQA achieves competitive performance on generic VQA datatsets in the zero-shot evaluation setting.
arXiv Detail & Related papers (2024-02-06T11:07:05Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - OpenCQA: Open-ended Question Answering with Charts [6.7038829115674945]
We introduce a new task called OpenCQA, where the goal is to answer an open-ended question about a chart with texts.
We implement and evaluate a set of baselines under three practical settings.
Our analysis of the results show that the top performing models generally produce fluent and coherent text.
arXiv Detail & Related papers (2022-10-12T23:37:30Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual
Question Answering [26.21870452615222]
FVQA requires external knowledge beyond visible content to answer questions about an image.
How to capture the question-oriented and information-complementary evidence remains a key challenge to solve the problem.
We propose a modality-aware heterogeneous graph convolutional network to capture evidence from different layers that is most relevant to the given question.
arXiv Detail & Related papers (2020-06-16T11:03:37Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.