Exploring Weaknesses of VQA Models through Attribution Driven Insights
- URL: http://arxiv.org/abs/2006.06637v2
- Date: Tue, 16 Jun 2020 12:01:03 GMT
- Title: Exploring Weaknesses of VQA Models through Attribution Driven Insights
- Authors: Shaunak Halbe
- Abstract summary: Recent research effectively applies these VQA models for answering visual questions for the blind.
We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks have been successfully used for the task of Visual
Question Answering for the past few years owing to the availability of relevant
large scale datasets. However these datasets are created in artificial settings
and rarely reflect the real world scenario. Recent research effectively applies
these VQA models for answering visual questions for the blind. Despite
achieving high accuracy these models appear to be susceptible to variation in
input questions.We analyze popular VQA models through the lens of attribution
(input's influence on predictions) to gain valuable insights. Further, We use
these insights to craft adversarial attacks which inflict significant damage to
these systems with negligible change in meaning of the input questions. We
believe this will enhance development of systems more robust to the possible
variations in inputs when deployed to assist the visually impaired.
Related papers
- QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems [3.486120902611884]
This paper explores the significance of different question types for VQA systems and their impact on performance.
We propose QTG-VQA, a novel architecture that incorporates question-type-guided attention and adaptive learning mechanism.
arXiv Detail & Related papers (2024-09-14T07:42:41Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Generative Visual Question Answering [0.0]
This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization.
We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion.
Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images.
arXiv Detail & Related papers (2023-07-18T05:30:23Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
Interventions [18.545193011418654]
We look at the generalization capabilities of visual question answering (VQA) systems.
We propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions.
We find substantial failure cases which reveal that current VQA systems are still brittle.
arXiv Detail & Related papers (2021-06-08T16:09:47Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - A New Score for Adaptive Tests in Bayesian and Credal Networks [64.80185026979883]
A test is adaptive when its sequence and number of questions is dynamically tuned on the basis of the estimated skills of the taker.
We present an alternative family of scores, based on the mode of the posterior probabilities, and hence easier to explain.
arXiv Detail & Related papers (2021-05-25T20:35:42Z) - Latent Variable Models for Visual Question Answering [34.9601948665926]
We propose latent variable models for Visual Question Answering.
Extra information (e.g. captions and answer categories) are incorporated as latent variables to improve inference.
Experiments on the VQA v2.0 benchmarking dataset demonstrate the effectiveness of our proposed models.
arXiv Detail & Related papers (2021-01-16T08:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.