Related papers: DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes

DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes

URL: http://arxiv.org/abs/2601.13839v1
Date: Tue, 20 Jan 2026 10:50:46 GMT
Title: DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
Authors: Aisha Al-Mohannadi, Ayisha Firoz, Yin Yang, Muhammad Imran, Ferda Ofli,
Abstract summary: DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes.<n>We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks.<n>DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.
Score: 10.776782815521686
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Social media imagery provides a low-latency source of situational information during natural and human-induced disasters, enabling rapid damage assessment and response. While Visual Question Answering (VQA) has shown strong performance in general-purpose domains, its suitability for the complex and safety-critical reasoning required in disaster response remains unclear. We introduce DisasterVQA, a benchmark dataset designed for perception and reasoning in crisis contexts. DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes. Grounded in humanitarian frameworks including FEMA ESF and OCHA MIRA, the dataset includes binary, multiple-choice, and open-ended questions covering situational awareness and operational decision-making tasks. We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks. Although models achieve high accuracy on binary questions, they struggle with fine-grained quantitative reasoning, object counting, and context-sensitive interpretation, particularly for underrepresented disaster scenarios. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response. The dataset is publicly available at https://zenodo.org/records/18267770.

Related papers

Disaster Question Answering with LoRA Efficiency and Accurate End Position [0.0]
This work introduces a disaster-focused question answering system based on Japanese disaster situations and response experiences.<n>We achieved 70.4% End Position accuracy with only 5.7% of the total parameters (6.7M/117M)<n>Future challenges include: establishing natural disaster Q&A benchmark datasets, fine-tuning foundation models with disaster knowledge, and developing lightweight and power-efficient edge AI Disaster Q&A applications.
arXiv Detail & Related papers (2026-01-28T01:53:16Z)
DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment [19.434058305975167]
DisasterInsight is a benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks.<n>It restructures the xBD dataset into approximately 112K building-centered instances.<n>It supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines.
arXiv Detail & Related papers (2026-01-26T13:48:11Z)
AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments [1.381010753883328]
We introduce AIFloodSense, a comprehensive, publicly available aerial imagery dataset comprising 470 high-resolution images from 230 distinct flood events across 64 countries and six continents.<n>Unlike prior benchmarks, AIFloodSense ensures global diversity and temporal relevance (2022-2024), supporting three complementary tasks.<n>We establish baseline benchmarks for all tasks using state-of-the-art architectures, demonstrating the dataset's complexity and its value.
arXiv Detail & Related papers (2025-12-19T10:34:45Z)
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response [36.84140335631884]
DisasterM3 is a vision-language dataset for global-scale disaster assessment and response.<n>DisasterM3 includes 26,988 bi-temporal satellite images and 123k instruction pairs across 5 continents.<n>Based on real-world scenarios, DisasterM3 includes 9 disaster-related visual perception and reasoning tasks.
arXiv Detail & Related papers (2025-05-27T12:16:07Z)
BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response [50.76124284445902]
Building damage assessment (BDA) is an essential capability in the aftermath of a disaster to reduce human casualties.<n>Recent research focuses on the development of AI models to achieve accurate mapping of unseen disaster events.<n>We present a BDA dataset using veRy-hIGH-resoluTion optical and SAR imagery (BRIGHT) to support AI-based all-weather disaster response.
arXiv Detail & Related papers (2025-01-10T14:57:18Z)
CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics [50.122541222825156]
This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM)<n>Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM.<n>This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid.
arXiv Detail & Related papers (2024-06-16T23:01:10Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model [17.016411785224317]
We introduce DAVI (Disaster Assessment with VIsion foundation model), a novel approach that addresses domain disparities and detects structural damage at the building level without requiring ground-truth labels for target regions.<n>DAVI combines task-specific knowledge from a model trained on source regions with task-agnostic knowledge from an image segmentation model to generate pseudo labels indicating potential damage in target regions.<n>It then utilizes a two-stage refinement process, which operate at both pixel and image levels, to accurately identify changes in disaster-affected areas.
arXiv Detail & Related papers (2024-06-12T09:21:28Z)
CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z)
Assessing out-of-domain generalization for robust building damage detection [78.6363825307044]
Building damage detection can be automated by applying computer vision techniques to satellite imagery. Models must be robust to a shift in distribution between disaster imagery available for training and the images of the new event. We argue that future work should focus on the OOD regime instead.
arXiv Detail & Related papers (2020-11-20T10:30:43Z)
RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery [83.49145695899388]
RescueNet is a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods.
arXiv Detail & Related papers (2020-04-15T19:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.