Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
Interventions
- URL: http://arxiv.org/abs/2106.04484v1
- Date: Tue, 8 Jun 2021 16:09:47 GMT
- Title: Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
Interventions
- Authors: Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
- Abstract summary: We look at the generalization capabilities of visual question answering (VQA) systems.
We propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions.
We find substantial failure cases which reveal that current VQA systems are still brittle.
- Score: 18.545193011418654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithms have shown promising results in visual question
answering (VQA) tasks, but a more careful look reveals that they often do not
understand the rich signal they are being fed with. To understand and better
measure the generalization capabilities of VQA systems, we look at their
robustness to counterfactually augmented data. Our proposed augmentations are
designed to make a focused intervention on a specific property of the question
such that the answer changes. Using these augmentations, we propose a new
robustness measure, Robustness to Augmented Data (RAD), which measures the
consistency of model predictions between original and augmented examples.
Through extensive experimentation, we show that RAD, unlike classical accuracy
measures, can quantify when state-of-the-art systems are not robust to
counterfactuals. We find substantial failure cases which reveal that current
VQA systems are still brittle. Finally, we connect between robustness and
generalization, demonstrating the predictive power of RAD for performance on
unseen augmentations.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - On the Onset of Robust Overfitting in Adversarial Training [66.27055915739331]
Adversarial Training (AT) is a widely-used algorithm for building robust neural networks.
AT suffers from the issue of robust overfitting, the fundamental mechanism of which remains unclear.
arXiv Detail & Related papers (2023-10-01T07:57:03Z) - Few-shot Weakly-supervised Cybersecurity Anomaly Detection [1.179179628317559]
We propose an enhancement to an existing few-shot weakly-supervised deep learning anomaly detection framework.
This framework incorporates data augmentation, representation learning and ordinal regression.
We then evaluated and showed the performance of our implemented framework on three benchmark datasets.
arXiv Detail & Related papers (2023-04-15T04:37:54Z) - Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision.
Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Efficient Attention Branch Network with Combined Loss Function for
Automatic Speaker Verification Spoof Detection [7.219077740523682]
Models currently deployed for the task of Automatic Speaker Verification are, at their best, devoid of suitable degrees of generalization to unseen attacks.
The present study proposes the Efficient Attention Branch Network (EABN) modular architecture with a combined loss function to address the generalization problem.
arXiv Detail & Related papers (2021-09-05T12:10:16Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - Unifying Model Explainability and Robustness via Machine-Checkable
Concepts [33.88198813484126]
We propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts.
Our framework defines a large number of concepts that the explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness.
Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly.
arXiv Detail & Related papers (2020-07-01T05:21:16Z) - Exploring Weaknesses of VQA Models through Attribution Driven Insights [0.0]
Recent research effectively applies these VQA models for answering visual questions for the blind.
We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights.
arXiv Detail & Related papers (2020-06-11T17:30:07Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.