DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
- URL: http://arxiv.org/abs/2601.18493v1
- Date: Mon, 26 Jan 2026 13:48:11 GMT
- Title: DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
- Authors: Sara Tehrani, Yonghao Xu, Leif Haglund, Amanda Berg, Michael Felsberg,
- Abstract summary: DisasterInsight is a benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks.<n>It restructures the xBD dataset into approximately 112K building-centered instances.<n>It supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines.
- Score: 19.434058305975167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Timely interpretation of satellite imagery is critical for disaster response, yet existing vision-language benchmarks for remote sensing largely focus on coarse labels and image-level recognition, overlooking the functional understanding and instruction robustness required in real humanitarian workflows. We introduce DisasterInsight, a multimodal benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks. DisasterInsight restructures the xBD dataset into approximately 112K building-centered instances and supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines. To establish domain-adapted baselines, we propose DI-Chat, obtained by fine-tuning existing VLM backbones on disaster-specific instruction data using parameter-efficient Low-Rank Adaptation (LoRA). Extensive experiments on state-of-the-art generic and remote-sensing VLMs reveal substantial performance gaps across tasks, particularly in damage understanding and structured report generation. DI-Chat achieves significant improvements on damage-level and disaster-type classification as well as report generation quality, while building-function classification remains challenging for all evaluated models. DisasterInsight provides a unified benchmark for studying grounded multimodal reasoning in disaster imagery.
Related papers
- Open-Vocabulary vs Supervised Learning Methods for Post-Disaster Visual Scene Understanding [4.918510966192794]
We present a comparative evaluation of supervised learning and open-vocabulary vision models for post-disaster scene understanding.<n>We focus on semantic segmentation and object detection across multiple datasets, including FloodNet+, RescueNet, DFire, and LADD.<n>The most notable remark across all evaluated benchmarks is that supervised training remains the most reliable approach.
arXiv Detail & Related papers (2026-03-01T23:50:08Z) - Understanding Degradation with Vision Language Model [56.09241449206817]
Understanding visual degradations is a critical yet challenging problem in computer vision.<n>We introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning.<n>We also introduce textbfDU-110k, a large-scale dataset comprising 110,000 clean-degraded pairs with grounded physical annotations.
arXiv Detail & Related papers (2026-02-04T13:51:15Z) - DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes [10.776782815521686]
DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes.<n>We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks.<n>DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.
arXiv Detail & Related papers (2026-01-20T10:50:46Z) - Satellite to Street : Disaster Impact Estimator [0.12938914787881173]
The present work proposes a deep-learning framework that jointly processes pre- and post-disaster satellite images to obtain fine-grained pixel-level damage maps: Satellite-to-Street: Disaster Impact Estimator.<n>The model uses a modified dual-input U-Net architecture with enhanced feature fusion to capture both the local structural changes as well as the broader contextual cues.
arXiv Detail & Related papers (2025-11-24T06:20:40Z) - Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks.
However, their robustness under natural and adversarial perturbations remains a critical concern.
We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - One-class Damage Detector Using Deeper Fully-Convolutional Data
Descriptions for Civil Application [0.0]
One-class damage detection approach has an advantage in that normal images can be used to optimize model parameters.
We propose a civil-purpose application for automating one-class damage detection reproducing a fully convolutional data description (FCDD) as a baseline model.
arXiv Detail & Related papers (2023-03-03T06:27:15Z) - Multi-view deep learning for reliable post-disaster damage
classification [0.0]
This study aims to enable more reliable automated post-disaster building damage classification using artificial intelligence (AI) and multi-view imagery.
The proposed model is trained and validated on reconnaissance visual dataset containing expert-labeled, geotagged images of the inspected buildings following hurricane Harvey.
arXiv Detail & Related papers (2022-08-06T01:04:13Z) - Assessing out-of-domain generalization for robust building damage
detection [78.6363825307044]
Building damage detection can be automated by applying computer vision techniques to satellite imagery.
Models must be robust to a shift in distribution between disaster imagery available for training and the images of the new event.
We argue that future work should focus on the OOD regime instead.
arXiv Detail & Related papers (2020-11-20T10:30:43Z) - RescueNet: Joint Building Segmentation and Damage Assessment from
Satellite Imagery [83.49145695899388]
RescueNet is a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end.
RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods.
arXiv Detail & Related papers (2020-04-15T19:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.