Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
Models
- URL: http://arxiv.org/abs/2106.00245v1
- Date: Tue, 1 Jun 2021 05:54:41 GMT
- Title: Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
Models
- Authors: Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu
- Abstract summary: We introduce Adversarial VQA, a new large-scale VQA benchmark, collected iteratively via an adversarial human-and-model-in-the-loop procedure.
We find that non-expert annotators can successfully attack SOTA VQA models with relative ease.
Both large-scale pre-trained models and adversarial training methods can only achieve far lower performance than what they can achieve on the standard VQA v2 dataset.
- Score: 45.777326168922635
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With large-scale pre-training, the past two years have witnessed significant
performance boost on the Visual Question Answering (VQA) task. Though rapid
progresses have been made, it remains unclear whether these state-of-the-art
(SOTA) VQA models are robust when encountering test examples in the wild. To
study this, we introduce Adversarial VQA, a new large-scale VQA benchmark,
collected iteratively via an adversarial human-and-model-in-the-loop procedure.
Through this new benchmark, we present several interesting findings. (i)
Surprisingly, during dataset collection, we find that non-expert annotators can
successfully attack SOTA VQA models with relative ease. (ii) We test a variety
of SOTA VQA models on our new dataset to highlight their fragility, and find
that both large-scale pre-trained models and adversarial training methods can
only achieve far lower performance than what they can achieve on the standard
VQA v2 dataset. (iii) When considered as data augmentation, our dataset can be
used to improve the performance on other robust VQA benchmarks. (iv) We present
a detailed analysis of the dataset, providing valuable insights on the
challenges it brings to the community. We hope Adversarial VQA can serve as a
valuable benchmark that will be used by future work to test the robustness of
its developed VQA models. Our dataset is publicly available at
https://adversarialvqa. github.io/.
Related papers
- Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models [71.06007696593704]
Blind quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in real-world video-enabled media applications.
As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets.
We conduct a first-of-its-kind computational analysis of VQA datasets via minimalistic BVQA models.
arXiv Detail & Related papers (2023-07-26T06:38:33Z) - Generative Visual Question Answering [0.0]
This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization.
We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion.
Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images.
arXiv Detail & Related papers (2023-07-18T05:30:23Z) - Multilingual Augmentation for Robust Visual Question Answering in Remote
Sensing Images [19.99615698375829]
We propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words.
Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model.
arXiv Detail & Related papers (2023-04-07T21:06:58Z) - All You May Need for VQA are Image Captions [24.634567673906666]
We propose a method that automatically derives VQA examples at volume.
We show that the resulting data is of high-quality.
VQA models trained on our data improve state-of-the-art zero-shot accuracy by double digits.
arXiv Detail & Related papers (2022-05-04T04:09:23Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated
Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled.
Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models.
By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models.
Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.