Related papers: DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment

DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment

URL: http://arxiv.org/abs/2601.18493v1
Date: Mon, 26 Jan 2026 13:48:11 GMT
Title: DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
Authors: Sara Tehrani, Yonghao Xu, Leif Haglund, Amanda Berg, Michael Felsberg,
Abstract summary: DisasterInsight is a benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks.<n>It restructures the xBD dataset into approximately 112K building-centered instances.<n>It supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines.
Score: 19.434058305975167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Timely interpretation of satellite imagery is critical for disaster response, yet existing vision-language benchmarks for remote sensing largely focus on coarse labels and image-level recognition, overlooking the functional understanding and instruction robustness required in real humanitarian workflows. We introduce DisasterInsight, a multimodal benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks. DisasterInsight restructures the xBD dataset into approximately 112K building-centered instances and supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines. To establish domain-adapted baselines, we propose DI-Chat, obtained by fine-tuning existing VLM backbones on disaster-specific instruction data using parameter-efficient Low-Rank Adaptation (LoRA). Extensive experiments on state-of-the-art generic and remote-sensing VLMs reveal substantial performance gaps across tasks, particularly in damage understanding and structured report generation. DI-Chat achieves significant improvements on damage-level and disaster-type classification as well as report generation quality, while building-function classification remains challenging for all evaluated models. DisasterInsight provides a unified benchmark for studying grounded multimodal reasoning in disaster imagery.

Related papers

Open-Vocabulary vs Supervised Learning Methods for Post-Disaster Visual Scene Understanding [4.918510966192794]
We present a comparative evaluation of supervised learning and open-vocabulary vision models for post-disaster scene understanding.<n>We focus on semantic segmentation and object detection across multiple datasets, including FloodNet+, RescueNet, DFire, and LADD.<n>The most notable remark across all evaluated benchmarks is that supervised training remains the most reliable approach.
arXiv Detail & Related papers (2026-03-01T23:50:08Z)
Understanding Degradation with Vision Language Model [56.09241449206817]
Understanding visual degradations is a critical yet challenging problem in computer vision.<n>We introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning.<n>We also introduce textbfDU-110k, a large-scale dataset comprising 110,000 clean-degraded pairs with grounded physical annotations.
arXiv Detail & Related papers (2026-02-04T13:51:15Z)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes [10.776782815521686]
DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes.<n>We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks.<n>DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.
arXiv Detail & Related papers (2026-01-20T10:50:46Z)
Satellite to Street : Disaster Impact Estimator [0.12938914787881173]
The present work proposes a deep-learning framework that jointly processes pre- and post-disaster satellite images to obtain fine-grained pixel-level damage maps: Satellite-to-Street: Disaster Impact Estimator.<n>The model uses a modified dual-input U-Net architecture with enhanced feature fusion to capture both the local structural changes as well as the broader contextual cues.
arXiv Detail & Related papers (2025-11-24T06:20:40Z)
Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z)
Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks. However, their robustness under natural and adversarial perturbations remains a critical concern. We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z)
One-class Damage Detector Using Deeper Fully-Convolutional Data Descriptions for Civil Application [0.0]
One-class damage detection approach has an advantage in that normal images can be used to optimize model parameters. We propose a civil-purpose application for automating one-class damage detection reproducing a fully convolutional data description (FCDD) as a baseline model.
arXiv Detail & Related papers (2023-03-03T06:27:15Z)
Multi-view deep learning for reliable post-disaster damage classification [0.0]
This study aims to enable more reliable automated post-disaster building damage classification using artificial intelligence (AI) and multi-view imagery. The proposed model is trained and validated on reconnaissance visual dataset containing expert-labeled, geotagged images of the inspected buildings following hurricane Harvey.
arXiv Detail & Related papers (2022-08-06T01:04:13Z)
Assessing out-of-domain generalization for robust building damage detection [78.6363825307044]
Building damage detection can be automated by applying computer vision techniques to satellite imagery. Models must be robust to a shift in distribution between disaster imagery available for training and the images of the new event. We argue that future work should focus on the OOD regime instead.
arXiv Detail & Related papers (2020-11-20T10:30:43Z)
RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery [83.49145695899388]
RescueNet is a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods.
arXiv Detail & Related papers (2020-04-15T19:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.