Related papers: Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

URL: http://arxiv.org/abs/2106.04484v1
Date: Tue, 8 Jun 2021 16:09:47 GMT
Title: Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Authors: Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
Abstract summary: We look at the generalization capabilities of visual question answering (VQA) systems. We propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions. We find substantial failure cases which reveal that current VQA systems are still brittle.
Score: 18.545193011418654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes. Using these augmentations, we propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions between original and augmented examples. Through extensive experimentation, we show that RAD, unlike classical accuracy measures, can quantify when state-of-the-art systems are not robust to counterfactuals. We find substantial failure cases which reveal that current VQA systems are still brittle. Finally, we connect between robustness and generalization, demonstrating the predictive power of RAD for performance on unseen augmentations.

Related papers

Uncertainty Quantification in Retrieval Augmented Question Answering [57.05827081638329]
We propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.
arXiv Detail & Related papers (2025-02-25T11:24:52Z)
Semantic Tokens in Retrieval Augmented Generation [0.0]
I propose a novel Comparative RAG system that introduces an evaluator module to bridge the gap between probabilistic RAG systems and deterministically verifiable responses. This framework paves the way for more reliable and scalable question-answering applications in domains requiring high precision and verifiability.
arXiv Detail & Related papers (2024-12-03T16:52:06Z)
CAVE: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
In this study, we explore an ensemble-based approach to improve classification accuracy in complex image datasets. We leverage the unique feature-extraction capabilities of each model to enhance the overall accuracy. Experimental evaluations demonstrate that the ensemble achieves higher accuracy and robustness across challenging and imbalanced classes.
arXiv Detail & Related papers (2024-10-26T17:25:08Z)
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network [0.47248250311484113]
Quality-of-Service (QoS) prediction is a critical task in the service lifecycle. Traditional methods often encounter data sparsity and cold-start issues. We introduce a real-time, trust-aware framework for temporal prediction.
arXiv Detail & Related papers (2024-10-23T11:01:39Z)
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. We introduce a robust network module that benefits from a two-stage cross-modal alignment task. It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training. In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z)
Causal Interventional Prediction System for Robust and Explainable Effect Forecasting [14.104665282086339]
We explore the robustness and explainability of AI-based forecasting systems. We design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations.
arXiv Detail & Related papers (2024-07-29T04:16:45Z)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA. We first augment the existing data via deliberate perturbations on either the image or question. We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z)
Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions. performs the detection by distinguishing the AE's abnormal relation with its augmented versions. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z)
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics. Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets. Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z)
Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses. We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z)
Unifying Model Explainability and Robustness via Machine-Checkable Concepts [33.88198813484126]
We propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly.
arXiv Detail & Related papers (2020-07-01T05:21:16Z)
Exploring Weaknesses of VQA Models through Attribution Driven Insights [0.0]
Recent research effectively applies these VQA models for answering visual questions for the blind. We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights.
arXiv Detail & Related papers (2020-06-11T17:30:07Z)
Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA) First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA) Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.