Language bias in Visual Question Answering: A Survey and Taxonomy
- URL: http://arxiv.org/abs/2111.08531v1
- Date: Tue, 16 Nov 2021 15:01:24 GMT
- Title: Language bias in Visual Question Answering: A Survey and Taxonomy
- Authors: Desen Yuan
- Abstract summary: We conduct a comprehensive review and analysis of this field for the first time.
We classify the existing methods according to three categories, including enhancing visual information.
The causes of language bias are revealed and classified.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual question answering (VQA) is a challenging task, which has attracted
more and more attention in the field of computer vision and natural language
processing. However, the current visual question answering has the problem of
language bias, which reduces the robustness of the model and has an adverse
impact on the practical application of visual question answering. In this
paper, we conduct a comprehensive review and analysis of this field for the
first time, and classify the existing methods according to three categories,
including enhancing visual information, weakening language priors, data
enhancement and training strategies. At the same time, the relevant
representative methods are introduced, summarized and analyzed in turn. The
causes of language bias are revealed and classified. Secondly, this paper
introduces the datasets mainly used for testing, and reports the experimental
results of various existing methods. Finally, we discuss the possible future
research directions in this field.
Related papers
- Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions [7.064953237013352]
We focus on the research works that focus on text generation for visualizations.
To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions.
We categorize the solutions used in the surveyed papers based on these "five Wh-questions"
arXiv Detail & Related papers (2024-09-29T15:53:18Z) - Robust Visual Question Answering: Datasets, Methods, and Future
Challenges [23.59923999144776]
Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question.
Previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers.
Various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively.
arXiv Detail & Related papers (2023-07-21T10:12:09Z) - Unveiling Cross Modality Bias in Visual Question Answering: A Causal
View with Possible Worlds VQA [111.41719652451701]
We first model a confounding effect that causes language and vision bias simultaneously.
We then propose a counterfactual inference to remove the influence of this effect.
The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
arXiv Detail & Related papers (2023-05-31T09:02:58Z) - Visual Perturbation-aware Collaborative Learning for Overcoming the
Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration.
Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents.
The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - Visually grounded models of spoken language: A survey of datasets,
architectures and evaluation techniques [15.906959137350247]
This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years.
We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work.
arXiv Detail & Related papers (2021-04-27T14:32:22Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Survey on Visual Sentiment Analysis [87.20223213370004]
This paper reviews pertinent publications and tries to present an exhaustive overview of the field of Visual Sentiment Analysis.
The paper also describes principles of design of general Visual Sentiment Analysis systems from three main points of view.
A formalization of the problem is discussed, considering different levels of granularity, as well as the components that can affect the sentiment toward an image in different ways.
arXiv Detail & Related papers (2020-04-24T10:15:22Z) - On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering [120.64104995052189]
We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages.
Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct.
The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge.
arXiv Detail & Related papers (2020-02-24T13:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.