Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
Interventions
- URL: http://arxiv.org/abs/2106.04484v1
- Date: Tue, 8 Jun 2021 16:09:47 GMT
- Title: Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
Interventions
- Authors: Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
- Abstract summary: We look at the generalization capabilities of visual question answering (VQA) systems.
We propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions.
We find substantial failure cases which reveal that current VQA systems are still brittle.
- Score: 18.545193011418654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithms have shown promising results in visual question
answering (VQA) tasks, but a more careful look reveals that they often do not
understand the rich signal they are being fed with. To understand and better
measure the generalization capabilities of VQA systems, we look at their
robustness to counterfactually augmented data. Our proposed augmentations are
designed to make a focused intervention on a specific property of the question
such that the answer changes. Using these augmentations, we propose a new
robustness measure, Robustness to Augmented Data (RAD), which measures the
consistency of model predictions between original and augmented examples.
Through extensive experimentation, we show that RAD, unlike classical accuracy
measures, can quantify when state-of-the-art systems are not robust to
counterfactuals. We find substantial failure cases which reveal that current
VQA systems are still brittle. Finally, we connect between robustness and
generalization, demonstrating the predictive power of RAD for performance on
unseen augmentations.
Related papers
- CAVE: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
In this study, we explore an ensemble-based approach to improve classification accuracy in complex image datasets.
We leverage the unique feature-extraction capabilities of each model to enhance the overall accuracy.
Experimental evaluations demonstrate that the ensemble achieves higher accuracy and robustness across challenging and imbalanced classes.
arXiv Detail & Related papers (2024-10-26T17:25:08Z) - Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network [0.47248250311484113]
Quality-of-Service (QoS) prediction is a critical task in the service lifecycle.
Traditional methods often encounter data sparsity and cold-start issues.
We introduce a real-time, trust-aware framework for temporal prediction.
arXiv Detail & Related papers (2024-10-23T11:01:39Z) - Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos.
We introduce a robust network module that benefits from a two-stage cross-modal alignment task.
It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training.
In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z) - Causal Interventional Prediction System for Robust and Explainable Effect Forecasting [14.104665282086339]
We explore the robustness and explainability of AI-based forecasting systems.
We design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations.
arXiv Detail & Related papers (2024-07-29T04:16:45Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - Unifying Model Explainability and Robustness via Machine-Checkable
Concepts [33.88198813484126]
We propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts.
Our framework defines a large number of concepts that the explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness.
Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly.
arXiv Detail & Related papers (2020-07-01T05:21:16Z) - Exploring Weaknesses of VQA Models through Attribution Driven Insights [0.0]
Recent research effectively applies these VQA models for answering visual questions for the blind.
We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights.
arXiv Detail & Related papers (2020-06-11T17:30:07Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.