VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment
and Analysis
- URL: http://arxiv.org/abs/2106.10548v1
- Date: Sat, 19 Jun 2021 18:28:16 GMT
- Title: VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment
and Analysis
- Authors: Argho Sarkar, Maryam Rahnemoonfar
- Abstract summary: Visual Question Answering system integrated with Unmanned Aerial Vehicle (UAV) has a lot of potentials to advance the post-disaster damage assessment purpose.
We present our recently developed VQA dataset called textitHurMic-VQA collected during hurricane Michael.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering system integrated with Unmanned Aerial Vehicle
(UAV) has a lot of potentials to advance the post-disaster damage assessment
purpose. Providing assistance to affected areas is highly dependent on
real-time data assessment and analysis. Scope of the Visual Question Answering
is to understand the scene and provide query related answer which certainly
faster the recovery process after any disaster. In this work, we address the
importance of \textit{visual question answering (VQA)} task for post-disaster
damage assessment by presenting our recently developed VQA dataset called
\textit{HurMic-VQA} collected during hurricane Michael, and comparing the
performances of baseline VQA models.
Related papers
- Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage
Assessment with Visual Contexts [6.820160182829294]
We propose a zero-shot VQA named Flood Disaster VQA with Two-Stage Prompt (VQA-TSP)
The model generates the thought process in the first stage and then uses the thought process to generate the final answer in the second stage.
Our method exceeds the performance of state-of-the-art zero-shot VQA models for flood disaster scenarios in total.
arXiv Detail & Related papers (2023-12-21T13:45:02Z) - Unleashing the Potential of Large Language Model: Zero-shot VQA for
Flood Disaster Scenario [6.820160182829294]
We propose a zero-shot VQA model named Zero-shot VQA for Flood Disaster Damage Assessment (ZFDDA)
With flood disaster as the main research object, we build a Freestyle Flood Disaster Image Question Answering dataset (FFD-IQA)
This new dataset expands the question types to include free-form, multiple-choice, and yes-no questions.
Our model uses well-designed chain of thought (CoT) demonstrations to unlock the potential of the large language model.
arXiv Detail & Related papers (2023-12-04T13:25:16Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Measuring Faithful and Plausible Visual Grounding in VQA [23.717744098159717]
Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) systems aim to measure a system's reliance on relevant parts of the image when inferring an answer to the given question.
Lack of VG has been a common problem among state-of-the-art VQA systems and can manifest in over-reliance on irrelevant image parts or a disregard for the visual modality entirely.
We propose a new VG metric that captures if a model a) identifies question-relevant objects in the scene, and b) actually relies on the information contained in the relevant objects when producing its answer.
arXiv Detail & Related papers (2023-05-24T10:58:02Z) - What's Different between Visual Question Answering for Machine
"Understanding" Versus for Accessibility? [8.373151777137792]
In visual question answering (VQA), a machine must answer a question given an associated image.
We evaluate discrepancies between machine "understanding" datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models.
Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.
arXiv Detail & Related papers (2022-10-26T18:23:53Z) - Continual VQA for Disaster Response Systems [0.0]
Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image.
Main challenge is the delay caused by the generation of labels in the assessment of the affected areas.
We deploy pre-trained CLIP model, which is trained on visual-image pairs.
We surpass previous state-of-the-art results on the FloodNet dataset.
arXiv Detail & Related papers (2022-09-21T12:45:51Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Found a Reason for me? Weakly-supervised Grounded Visual Question
Answering using Capsules [85.98177341704675]
The problem of grounding VQA tasks has seen an increased attention in the research community recently.
We propose a visual capsule module with a query-based selection mechanism of capsule features.
We show that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task.
arXiv Detail & Related papers (2021-05-11T07:45:32Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.