DisastIR: A Comprehensive Information Retrieval Benchmark for Disaster Management
- URL: http://arxiv.org/abs/2505.15856v3
- Date: Sat, 20 Sep 2025 05:56:00 GMT
- Title: DisastIR: A Comprehensive Information Retrieval Benchmark for Disaster Management
- Authors: Kai Yin, Xiangjue Dong, Chengkai Liu, Lipai Huang, Yiming Xiao, Zhewei Liu, Ali Mostafavi, James Caverlee,
- Abstract summary: We introduce DisastIR, the first comprehensive Information Retrieval evaluation benchmark specifically tailored for disaster management.<n>DisastIR comprises 9,600 diverse user queries and more than 1.3 million labeled query-passage pairs, covering 48 distinct retrieval tasks.<n>Our evaluations of 30 state-of-the-art retrieval models demonstrate significant performance variances across tasks, with no single model excelling universally.
- Score: 24.48064724587068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective disaster management requires timely access to accurate and contextually relevant information. Existing Information Retrieval (IR) benchmarks, however, focus primarily on general or specialized domains, such as medicine or finance, neglecting the unique linguistic complexity and diverse information needs encountered in disaster management scenarios. To bridge this gap, we introduce DisastIR, the first comprehensive IR evaluation benchmark specifically tailored for disaster management. DisastIR comprises 9,600 diverse user queries and more than 1.3 million labeled query-passage pairs, covering 48 distinct retrieval tasks derived from six search intents and eight general disaster categories that include 301 specific event types. Our evaluations of 30 state-of-the-art retrieval models demonstrate significant performance variances across tasks, with no single model excelling universally. Furthermore, comparative analyses reveal significant performance gaps between general-domain and disaster management-specific tasks, highlighting the necessity of disaster management-specific benchmarks for guiding IR model selection to support effective decision-making in disaster management scenarios. All source codes and DisastIR are available at https://github.com/KaiYin97/Disaster_IR.
Related papers
- GISA: A Benchmark for General Information-Seeking Assistant [102.30831921333755]
GISA is a benchmark for General Information-Seeking Assistants comprising 373 human-crafted queries.<n>It integrates both deep reasoning and broad information aggregation within unified tasks, and includes a live subset with periodically updated answers to resist memorization.<n>Experiments on mainstream LLMs and commercial search products reveal that even the best-performing model achieves only 19.30% exact match score.
arXiv Detail & Related papers (2026-02-09T11:44:15Z) - DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment [19.434058305975167]
DisasterInsight is a benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks.<n>It restructures the xBD dataset into approximately 112K building-centered instances.<n>It supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines.
arXiv Detail & Related papers (2026-01-26T13:48:11Z) - A Lightweight LLM Framework for Disaster Humanitarian Information Classification [0.0]
This paper develops a lightweight, cost-effective framework for disaster tweet classification using parameter-efficient fine-tuning.<n>We construct a unified experimental corpus by integrating and normalizing the HumAID dataset.<n>We demonstrate that LoRA achieves 79.62% humanitarian classification accuracy (+37.79% over zero-shot) while training only 2% of parameters.
arXiv Detail & Related papers (2026-01-21T02:05:25Z) - DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes [10.776782815521686]
DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes.<n>We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks.<n>DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.
arXiv Detail & Related papers (2026-01-20T10:50:46Z) - DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management [21.721973352020935]
DMRetriever is a first series of dense retrieval models tailored for disaster management scenarios.<n>It achieves state-of-the-art (SOTA) performance across all six search intents at every model scale.<n>DMRetriever is highly parameter-efficient, with 596M model outperforming baselines over 13.3 X larger and 33M model exceeding baselines with only 7.6% of their parameters.
arXiv Detail & Related papers (2025-10-16T19:08:34Z) - Towards Anytime Retrieval: A Benchmark for Anytime Person Re-Identification [85.78039373517021]
Anytime Person Re-identification (AT-ReID) aims to achieve effective retrieval in multiple scenarios based on variations in time.<n>We collect the first large-scale dataset, AT-USTC, which contains 403k images of individuals wearing multiple clothes.<n>We propose a unified model named Uni-AT, which comprises a multi-scenario ReID framework for scenario-specific features learning.
arXiv Detail & Related papers (2025-09-20T11:20:22Z) - Adapting Vision-Language Models Without Labels: A Comprehensive Survey [74.17944178027015]
Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks.<n>Recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data.<n>We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms.
arXiv Detail & Related papers (2025-08-07T16:27:37Z) - RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping [101.22617426879079]
We build a large-scale grasping-oriented affordance segmentation benchmark with human-like instructions, named RAGNet.<n>The images cover diverse embodied data domains, such as wild, robot, ego-centric, and even simulation data.<n>We propose a comprehensive affordance-based grasping framework, named AffordanceNet, which consists of a VLM pre-trained on our massive affordance data and a grasping network that conditions an affordance map to grasp the target.
arXiv Detail & Related papers (2025-07-31T17:17:05Z) - Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification [0.46873264197900916]
DisasterNet-LLM is a specialized Large Language Model (LLM) designed for comprehensive disaster analysis.<n>By leveraging advanced pretraining, cross-modal attention mechanisms, and adaptive transformers, DisasterNet-LLM excels in disaster classification.
arXiv Detail & Related papers (2025-06-30T01:56:05Z) - Toward Understanding Bugs in Vector Database Management Systems [11.916195480211648]
Vector database management systems (VDBMSs) play a crucial role in facilitating semantic similarity searches over high-dimensional embeddings from diverse data sources.<n>Traditional database reliability models cannot be directly applied to VDBMSs because of fundamental differences in data representation, query mechanisms, and system architecture.<n>We manually analyzed 1,671 bug-fix pull requests from 15 widely used open-source VDBMSs and developed a comprehensive taxonomy of bugs based on symptoms, root causes, and developer fix strategies.
arXiv Detail & Related papers (2025-06-03T08:34:01Z) - Detecting Actionable Requests and Offers on Social Media During Crises Using LLMs [8.17728833322492]
We propose a fine-grained hierarchical taxonomy to organize crisis-related information about requests and offers into three critical dimensions: supplies, emergency personnel, and actions.<n>We introduce Query-Specific Few-shot Learning (QSF Learning) that retrieves class-specific labeled examples from an embedding database to enhance the model's performance in detecting and classifying posts.
arXiv Detail & Related papers (2025-04-22T08:34:58Z) - Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z) - RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents [11.08910129925713]
This paper introduces Adaptive Disaster Interpretation (ADI), a novel task designed to solve requests by planning and executing multiple correlative interpretation tasks.
We present a new dataset named RescueADI, which contains high-resolution RSIs with annotations for three connected aspects: planning, perception, and recognition.
We propose a new disaster interpretation method employing autonomous agents driven by large language models (LLMs) for task planning and execution.
arXiv Detail & Related papers (2024-10-17T09:36:52Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review [0.29998889086656577]
This study proposes an AIOps terminology and taxonomy, establishing a structured incident management procedure and providing guidelines for constructing an AIOps framework.
The goal is to provide a comprehensive review of technical and research aspects in AIOps for incident management, aiming to structure knowledge, identify gaps, and establish a foundation for future developments in the field.
arXiv Detail & Related papers (2024-04-01T17:32:22Z) - StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving [76.5322280307861]
StrategyLLM allows LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts.
Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2% $rightarrow$ 38.8%), commonsense reasoning (70.3% $rightarrow$ 72.5%), algorithmic reasoning (73.7% $rightarrow$ 85.0
arXiv Detail & Related papers (2023-11-15T09:18:09Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - ISSAFE: Improving Semantic Segmentation in Accidents by Fusing
Event-based Data [34.36975697486129]
We present a rarely addressed task regarding semantic segmentation in accidental scenarios, along with an accident dataset DADA-seg.
We propose a novel event-based multi-modal segmentation architecture ISSAFE.
Our approach achieves +8.2% mIoU performance gain on the proposed evaluation set, exceeding more than 10 state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-08-20T14:03:34Z) - Event Prediction in the Big Data Era: A Systematic Survey [7.3810864598379755]
Event prediction is becoming a viable option in the big data era.
This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction.
arXiv Detail & Related papers (2020-07-19T23:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.