DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
- URL: http://arxiv.org/abs/2505.21089v1
- Date: Tue, 27 May 2025 12:16:07 GMT
- Title: DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
- Authors: Junjue Wang, Weihao Xuan, Heli Qi, Zhihao Liu, Kunyi Liu, Yuhan Wu, Hongruixuan Chen, Jian Song, Junshi Xia, Zhuo Zheng, Naoto Yokoya,
- Abstract summary: DisasterM3 is a vision-language dataset for global-scale disaster assessment and response.<n>DisasterM3 includes 26,988 bi-temporal satellite images and 123k instruction pairs across 5 continents.<n>Based on real-world scenarios, DisasterM3 includes 9 disaster-related visual perception and reasoning tasks.
- Score: 20.208384252534657
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large vision-language models (VLMs) have made great achievements in Earth vision. However, complex disaster scenes with diverse disaster types, geographic regions, and satellite sensors have posed new challenges for VLM applications. To fill this gap, we curate a remote sensing vision-language dataset (DisasterM3) for global-scale disaster assessment and response. DisasterM3 includes 26,988 bi-temporal satellite images and 123k instruction pairs across 5 continents, with three characteristics: 1) Multi-hazard: DisasterM3 involves 36 historical disaster events with significant impacts, which are categorized into 10 common natural and man-made disasters. 2)Multi-sensor: Extreme weather during disasters often hinders optical sensor imaging, making it necessary to combine Synthetic Aperture Radar (SAR) imagery for post-disaster scenes. 3) Multi-task: Based on real-world scenarios, DisasterM3 includes 9 disaster-related visual perception and reasoning tasks, harnessing the full potential of VLM's reasoning ability with progressing from disaster-bearing body recognition to structural damage assessment and object relational reasoning, culminating in the generation of long-form disaster reports. We extensively evaluated 14 generic and remote sensing VLMs on our benchmark, revealing that state-of-the-art models struggle with the disaster tasks, largely due to the lack of a disaster-specific corpus, cross-sensor gap, and damage object counting insensitivity. Focusing on these issues, we fine-tune four VLMs using our dataset and achieve stable improvements across all tasks, with robust cross-sensor and cross-disaster generalization capabilities.
Related papers
- MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing [39.47126465689941]
We present MONITRS, a novel dataset of more than 10,000 FEMA disaster events with temporal satellite imagery and natural language annotations from news articles.<n>We demonstrate that fine-tuning existing MLLMs on our dataset yields significant performance improvements for disaster monitoring tasks.
arXiv Detail & Related papers (2025-07-22T04:59:09Z) - A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake [1.6070833439280312]
Building damage identification shortly after a disaster is crucial for guiding emergency response and recovery efforts.<n>We introduce a novel multimodal deep learning (DL) framework for detecting building damage using single-date very high resolution (VHR) Synthetic Aperture Radar (SAR) imagery.<n>Our method integrates SAR image patches, OpenStreetMap (OSM) building footprints, digital surface model (DSM) data, and structural and exposure attributes from the Global Earthquake Model (GEM)<n>Results highlight that incorporating geospatial features significantly enhances detection performance and generalizability to previously unseen areas.
arXiv Detail & Related papers (2025-06-27T15:49:58Z) - BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response [37.37991234180912]
Building damage assessment (BDA) is an essential capability in the aftermath of a disaster to reduce human casualties.<n>Recent research focuses on the development of AI models to achieve accurate mapping of unseen disaster events.<n>We present a BDA dataset using veRy-hIGH-resoluTion optical and SAR imagery (BRIGHT) to support AI-based all-weather disaster response.
arXiv Detail & Related papers (2025-01-10T14:57:18Z) - DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments [60.69159598130235]
We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs)<n>DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.)<n>Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
arXiv Detail & Related papers (2024-12-28T06:13:44Z) - Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model [17.016411785224317]
We introduce DAVI (Disaster Assessment with VIsion foundation model), a novel approach that addresses domain disparities and detects structural damage at the building level without requiring ground-truth labels for target regions.<n>DAVI combines task-specific knowledge from a model trained on source regions with task-agnostic knowledge from an image segmentation model to generate pseudo labels indicating potential damage in target regions.<n>It then utilizes a two-stage refinement process, which operate at both pixel and image levels, to accurately identify changes in disaster-affected areas.
arXiv Detail & Related papers (2024-06-12T09:21:28Z) - MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in
Adverse Scenes [49.21187418886508]
This paper proposes a monocular 3D detection model designed to perceive twin depth in adverse scenes, termed MonoTDP.
We first introduce an adaptive learning strategy to aid the model in handling uncontrollable weather conditions, significantly resisting degradation caused by various degrading factors.
Then, to address the depth/content loss in adverse regions, we propose a novel twin depth perception module that simultaneously estimates scene and object depth.
arXiv Detail & Related papers (2023-05-18T13:42:02Z) - Classification of structural building damage grades from multi-temporal
photogrammetric point clouds using a machine learning model trained on
virtual laser scanning data [58.720142291102135]
We present a novel approach to automatically assess multi-class building damage from real-world point clouds.
We use a machine learning model trained on virtual laser scanning (VLS) data.
The model yields high multi-target classification accuracies (overall accuracy: 92.0% - 95.1%)
arXiv Detail & Related papers (2023-02-24T12:04:46Z) - Physics-informed GANs for Coastal Flood Visualization [65.54626149826066]
We create a deep learning pipeline that generates visual satellite images of current and future coastal flooding.
By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism.
While this work focused on the visualization of coastal floods, we envision the creation of a global visualization of how climate change will shape our earth.
arXiv Detail & Related papers (2020-10-16T02:15:34Z) - Learning from Multimodal and Multitemporal Earth Observation Data for
Building Damage Mapping [17.324397643429638]
We have developed a global multisensor and multitemporal dataset for building damage mapping.
The global dataset contains high-resolution optical imagery and high-to-moderate-resolution multiband SAR data.
We defined a damage mapping framework for the semantic segmentation of damaged buildings based on a deep convolutional neural network algorithm.
arXiv Detail & Related papers (2020-09-14T05:04:19Z) - Spatiotemporal Attacks for Embodied Agents [119.43832001301041]
We take the first step to study adversarial attacks for embodied agents.
In particular, we generate adversarial examples, which exploit the interaction history in both the temporal and spatial dimensions.
Our perturbations have strong attack and generalization abilities.
arXiv Detail & Related papers (2020-05-19T01:38:47Z) - RescueNet: Joint Building Segmentation and Damage Assessment from
Satellite Imagery [83.49145695899388]
RescueNet is a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end.
RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods.
arXiv Detail & Related papers (2020-04-15T19:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.