DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
- URL: http://arxiv.org/abs/2601.13839v1
- Date: Tue, 20 Jan 2026 10:50:46 GMT
- Title: DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
- Authors: Aisha Al-Mohannadi, Ayisha Firoz, Yin Yang, Muhammad Imran, Ferda Ofli,
- Abstract summary: DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes.<n>We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks.<n>DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.
- Score: 10.776782815521686
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Social media imagery provides a low-latency source of situational information during natural and human-induced disasters, enabling rapid damage assessment and response. While Visual Question Answering (VQA) has shown strong performance in general-purpose domains, its suitability for the complex and safety-critical reasoning required in disaster response remains unclear. We introduce DisasterVQA, a benchmark dataset designed for perception and reasoning in crisis contexts. DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes. Grounded in humanitarian frameworks including FEMA ESF and OCHA MIRA, the dataset includes binary, multiple-choice, and open-ended questions covering situational awareness and operational decision-making tasks. We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks. Although models achieve high accuracy on binary questions, they struggle with fine-grained quantitative reasoning, object counting, and context-sensitive interpretation, particularly for underrepresented disaster scenarios. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response. The dataset is publicly available at https://zenodo.org/records/18267770.
Related papers
- Disaster Question Answering with LoRA Efficiency and Accurate End Position [0.0]
This work introduces a disaster-focused question answering system based on Japanese disaster situations and response experiences.<n>We achieved 70.4% End Position accuracy with only 5.7% of the total parameters (6.7M/117M)<n>Future challenges include: establishing natural disaster Q&A benchmark datasets, fine-tuning foundation models with disaster knowledge, and developing lightweight and power-efficient edge AI Disaster Q&A applications.
arXiv Detail & Related papers (2026-01-28T01:53:16Z) - DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment [19.434058305975167]
DisasterInsight is a benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks.<n>It restructures the xBD dataset into approximately 112K building-centered instances.<n>It supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines.
arXiv Detail & Related papers (2026-01-26T13:48:11Z) - AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments [1.381010753883328]
We introduce AIFloodSense, a comprehensive, publicly available aerial imagery dataset comprising 470 high-resolution images from 230 distinct flood events across 64 countries and six continents.<n>Unlike prior benchmarks, AIFloodSense ensures global diversity and temporal relevance (2022-2024), supporting three complementary tasks.<n>We establish baseline benchmarks for all tasks using state-of-the-art architectures, demonstrating the dataset's complexity and its value.
arXiv Detail & Related papers (2025-12-19T10:34:45Z) - DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response [36.84140335631884]
DisasterM3 is a vision-language dataset for global-scale disaster assessment and response.<n>DisasterM3 includes 26,988 bi-temporal satellite images and 123k instruction pairs across 5 continents.<n>Based on real-world scenarios, DisasterM3 includes 9 disaster-related visual perception and reasoning tasks.
arXiv Detail & Related papers (2025-05-27T12:16:07Z) - BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response [50.76124284445902]
Building damage assessment (BDA) is an essential capability in the aftermath of a disaster to reduce human casualties.<n>Recent research focuses on the development of AI models to achieve accurate mapping of unseen disaster events.<n>We present a BDA dataset using veRy-hIGH-resoluTion optical and SAR imagery (BRIGHT) to support AI-based all-weather disaster response.
arXiv Detail & Related papers (2025-01-10T14:57:18Z) - CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [50.122541222825156]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)<n>Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.<n>This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model [17.016411785224317]
We introduce DAVI (Disaster Assessment with VIsion foundation model), a novel approach that addresses domain disparities and detects structural damage at the building level without requiring ground-truth labels for target regions.<n>DAVI combines task-specific knowledge from a model trained on source regions with task-agnostic knowledge from an image segmentation model to generate pseudo labels indicating potential damage in target regions.<n>It then utilizes a two-stage refinement process, which operate at both pixel and image levels, to accurately identify changes in disaster-affected areas.
arXiv Detail & Related papers (2024-06-12T09:21:28Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Assessing out-of-domain generalization for robust building damage
detection [78.6363825307044]
Building damage detection can be automated by applying computer vision techniques to satellite imagery.
Models must be robust to a shift in distribution between disaster imagery available for training and the images of the new event.
We argue that future work should focus on the OOD regime instead.
arXiv Detail & Related papers (2020-11-20T10:30:43Z) - RescueNet: Joint Building Segmentation and Damage Assessment from
Satellite Imagery [83.49145695899388]
RescueNet is a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end.
RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods.
arXiv Detail & Related papers (2020-04-15T19:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.