On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering
- URL: http://arxiv.org/abs/2002.10215v2
- Date: Wed, 26 Feb 2020 04:59:18 GMT
- Title: On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering
- Authors: Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo,
Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei Wang
- Abstract summary: We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages.
Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct.
The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge.
- Score: 120.64104995052189
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual Question Answering (VQA) methods have made incredible progress, but
suffer from a failure to generalize. This is visible in the fact that they are
vulnerable to learning coincidental correlations in the data rather than deeper
relations between image content and ideas expressed in language. We present a
dataset that takes a step towards addressing this problem in that it contains
questions expressed in two languages, and an evaluation process that co-opts a
well understood image-based metric to reflect the method's ability to reason.
Measuring reasoning directly encourages generalization by penalizing answers
that are coincidentally correct. The dataset reflects the scene-text version of
the VQA problem, and the reasoning evaluation can be seen as a text-based
version of a referring expression challenge. Experiments and analysis are
provided that show the value of the dataset.
Related papers
- The curse of language biases in remote sensing VQA: the role of spatial
attributes, language diversity, and the need for clear evaluation [32.7348470366509]
The goal of RSVQA is to answer a question formulated in natural language about a remote sensing image.
The problem of language biases is often overlooked in the remote sensing community.
The present work aims at highlighting the problem of language biases in RSVQA with a threefold analysis strategy.
arXiv Detail & Related papers (2023-11-28T13:45:15Z) - Making the V in Text-VQA Matter [1.2962828085662563]
Text-based VQA aims at answering questions by reading the text present in the images.
Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image.
The models trained on this dataset predict biased answers due to the lack of understanding of visual context.
arXiv Detail & Related papers (2023-08-01T05:28:13Z) - Unveiling Cross Modality Bias in Visual Question Answering: A Causal
View with Possible Worlds VQA [111.41719652451701]
We first model a confounding effect that causes language and vision bias simultaneously.
We then propose a counterfactual inference to remove the influence of this effect.
The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
arXiv Detail & Related papers (2023-05-31T09:02:58Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - Language bias in Visual Question Answering: A Survey and Taxonomy [0.0]
We conduct a comprehensive review and analysis of this field for the first time.
We classify the existing methods according to three categories, including enhancing visual information.
The causes of language bias are revealed and classified.
arXiv Detail & Related papers (2021-11-16T15:01:24Z) - Overcoming Language Priors with Self-supervised Learning for Visual
Question Answering [62.88124382512111]
Most Visual Question Answering (VQA) models suffer from the language prior problem.
We introduce a self-supervised learning framework to solve this problem.
Our method can significantly outperform the state-of-the-art.
arXiv Detail & Related papers (2020-12-17T12:30:12Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Counterfactual VQA: A Cause-Effect Look at Language Bias [117.84189187160005]
VQA models tend to rely on language bias as a shortcut and fail to sufficiently learn the multi-modal knowledge from both vision and language.
We propose a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers.
arXiv Detail & Related papers (2020-06-08T01:49:27Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z) - Robust Explanations for Visual Question Answering [24.685231217726194]
We propose a method to obtain robust explanations for visual question answering(VQA) that correlate well with the answers.
Our model explains the answers obtained through a VQA model by providing visual and textual explanations.
We showcase the robustness of the model against a noise-based perturbation attack using corresponding visual and textual explanations.
arXiv Detail & Related papers (2020-01-23T18:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.