Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage
Assessment with Visual Contexts
- URL: http://arxiv.org/abs/2312.13848v1
- Date: Thu, 21 Dec 2023 13:45:02 GMT
- Title: Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage
Assessment with Visual Contexts
- Authors: Yimin Sun, Chao Wang and Yan Peng
- Abstract summary: We propose a zero-shot VQA named Flood Disaster VQA with Two-Stage Prompt (VQA-TSP)
The model generates the thought process in the first stage and then uses the thought process to generate the final answer in the second stage.
Our method exceeds the performance of state-of-the-art zero-shot VQA models for flood disaster scenarios in total.
- Score: 6.820160182829294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The zero-shot performance of visual question answering (VQA) models relies
heavily on prompts. For example, a zero-shot VQA for disaster scenarios could
leverage well-designed Chain of Thought (CoT) prompts to stimulate the model's
potential. However, using CoT prompts has some problems, such as causing an
incorrect answer in the end due to the hallucination in the thought process. In
this paper, we propose a zero-shot VQA named Flood Disaster VQA with Two-Stage
Prompt (VQA-TSP). The model generates the thought process in the first stage
and then uses the thought process to generate the final answer in the second
stage. In particular, visual context is added in the second stage to relieve
the hallucination problem that exists in the thought process. Experimental
results show that our method exceeds the performance of state-of-the-art
zero-shot VQA models for flood disaster scenarios in total. Our study provides
a research basis for improving the performance of CoT-based zero-shot VQA.
Related papers
- Unleashing the Potential of Large Language Model: Zero-shot VQA for
Flood Disaster Scenario [6.820160182829294]
We propose a zero-shot VQA model named Zero-shot VQA for Flood Disaster Damage Assessment (ZFDDA)
With flood disaster as the main research object, we build a Freestyle Flood Disaster Image Question Answering dataset (FFD-IQA)
This new dataset expands the question types to include free-form, multiple-choice, and yes-no questions.
Our model uses well-designed chain of thought (CoT) demonstrations to unlock the potential of the large language model.
arXiv Detail & Related papers (2023-12-04T13:25:16Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Continual VQA for Disaster Response Systems [0.0]
Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image.
Main challenge is the delay caused by the generation of labels in the assessment of the affected areas.
We deploy pre-trained CLIP model, which is trained on visual-image pairs.
We surpass previous state-of-the-art results on the FloodNet dataset.
arXiv Detail & Related papers (2022-09-21T12:45:51Z) - VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment
and Analysis [0.7614628596146599]
Visual Question Answering system integrated with Unmanned Aerial Vehicle (UAV) has a lot of potentials to advance the post-disaster damage assessment purpose.
We present our recently developed VQA dataset called textitHurMic-VQA collected during hurricane Michael.
arXiv Detail & Related papers (2021-06-19T18:28:16Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" [49.76230210108583]
We propose a framework to isolate and evaluate the reasoning aspect of visual question answering (VQA) separately from its perception.
We also propose a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.
On the challenging GQA dataset, this framework is used to perform in-depth, disentangled comparisons between well-known VQA models.
arXiv Detail & Related papers (2020-06-20T08:48:29Z) - Generating Rationales in Visual Question Answering [28.45552957339557]
We propose a new task ofrationale generation for Visual QuestionAnswering (VQA)
We use data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truths along with visual questions and an-swers.
We train ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales.
arXiv Detail & Related papers (2020-04-04T22:15:35Z) - Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions.
We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.