Related papers: DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding

DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding

URL: http://arxiv.org/abs/2511.02495v1
Date: Tue, 04 Nov 2025 11:33:11 GMT
Title: DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
Authors: Zixuan Liu, Siavash H. Khajavi, Guangkai Jiang,
Abstract summary: We introduce DetectiumFire, a large-scale, multi-modal dataset comprising of 22.5k high-resolution fire-related images and 2.5k real-world fire-related videos.<n>The data are annotated with both traditional computer vision labels (e.g., bounding boxes) and detailed textual prompts describing the scene.<n>We validate the utility of DetectiumFire across multiple tasks, including object detection, diffusion-based image generation, and vision-language reasoning.
Score: 5.257894673786823
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advances in multi-modal models have demonstrated strong performance in tasks such as image generation and reasoning. However, applying these models to the fire domain remains challenging due to the lack of publicly available datasets with high-quality fire domain annotations. To address this gap, we introduce DetectiumFire, a large-scale, multi-modal dataset comprising of 22.5k high-resolution fire-related images and 2.5k real-world fire-related videos covering a wide range of fire types, environments, and risk levels. The data are annotated with both traditional computer vision labels (e.g., bounding boxes) and detailed textual prompts describing the scene, enabling applications such as synthetic data generation and fire risk reasoning. DetectiumFire offers clear advantages over existing benchmarks in scale, diversity, and data quality, significantly reducing redundancy and enhancing coverage of real-world scenarios. We validate the utility of DetectiumFire across multiple tasks, including object detection, diffusion-based image generation, and vision-language reasoning. Our results highlight the potential of this dataset to advance fire-related research and support the development of intelligent safety systems. We release DetectiumFire to promote broader exploration of fire understanding in the AI community. The dataset is available at https://kaggle.com/datasets/38b79c344bdfc55d1eed3d22fbaa9c31fad45e27edbbe9e3c529d6e5c4f93890

Related papers

PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery [0.0]
Rapid and accurate wildfire detection is crucial for emergency response and environmental management.<n>In airborne and spaceborne missions, real-time algorithms must distinguish between no fire, active fire, and post-fire conditions.<n>We introduce PyroFocus, a two-stage pipeline that performs fire classification followed by fire radiative power (FRP) regression or segmentation to reduce inference time and computational cost for onboard deployment.
arXiv Detail & Related papers (2025-12-02T21:59:45Z)
Exploring State-of-the-art models for Early Detection of Forest Fires [0.8127745323109788]
We propose a dataset for early identification of forest fires through visual analysis.<n>We obtained this dataset synthetically by utilising game simulators such as Red Dead Redemption 2.<n>We compared image classification and localisation methods on the proposed dataset.
arXiv Detail & Related papers (2025-11-25T09:13:07Z)
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle [69.84129020970477]
Existing methods lack the causal reasoning and understanding required for reliable generalization.<n>We introduce $textbfFireScope-Bench, a dataset and benchmark that couples Sentinel-2 imagery and climate data with expert-defined risks.<n>When trained in the USA and tested in Europe, $textbfFireScope$ achieves substantial performance gains.<n>Our findings demonstrate that reasoning can ground prediction models, improving both generalization and interpretability.
arXiv Detail & Related papers (2025-11-21T11:45:22Z)
Uint: Building Uint Detection Dataset [1.2166468091046596]
Fire scene datasets are crucial for training robust computer vision models.<n>There is a significant shortage of annotated data specifically targeting building units.<n>We introduce an annotated dataset of building units captured by drones, which incorporates multiple enhancement techniques.
arXiv Detail & Related papers (2025-08-05T06:36:41Z)
Eyes on the Environment: AI-Driven Analysis for Fire and Smoke Classification, Segmentation, and Detection [3.865779317336744]
Fire and smoke phenomena pose a significant threat to the natural environment, ecosystems, and global economy, as well as human lives and wildlife.<n>There is a demand for more sophisticated and advanced technologies to implement an effective strategy for early detection, real-time monitoring, and minimizing the overall impacts of fires on ecological balance and public safety.<n>These systems extensively rely on the availability of adequate and high-quality fire and smoke data to create proficient Machine Learning (ML) methods for various tasks, such as detection and monitoring.
arXiv Detail & Related papers (2025-03-17T22:08:02Z)
Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z)
Image-Based Fire Detection in Industrial Environments with YOLOv4 [53.180678723280145]
This work looks into the potential of AI to detect and recognize fires and reduce detection time using object detection on an image stream. To our end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector.
arXiv Detail & Related papers (2022-12-09T11:32:36Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
From Unsupervised to Few-shot Graph Anomaly Detection: A Multi-scale Contrastive Learning Approach [26.973056364587766]
Anomaly detection from graph data is an important data mining task in many applications such as social networks, finance, and e-commerce. We propose a novel framework, graph ANomaly dEtection framework with Multi-scale cONtrastive lEarning (ANEMONE in short) By using a graph neural network as a backbone to encode the information from multiple graph scales (views), we learn better representation for nodes in a graph.
arXiv Detail & Related papers (2022-02-11T09:45:11Z)
Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling. Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information. Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z)
Active Fire Detection in Landsat-8 Imagery: a Large-Scale Dataset and a Deep-Learning Study [1.3764085113103217]
This paper introduces a new large-scale dataset for active fire detection using deep learning techniques. We present a study on how different convolutional neural network architectures can be used to approximate handcrafted algorithms. The proposed dataset, source codes and trained models are available on Github.
arXiv Detail & Related papers (2021-01-09T19:05:03Z)
Uncertainty Aware Wildfire Management [6.997483623023005]
Recent wildfires in the United States have resulted in loss of life and billions of dollars. There are limited resources to be deployed over a massive area and the spread of the fire is challenging to predict. This paper proposes a decision-theoretic approach to combat wildfires.
arXiv Detail & Related papers (2020-10-15T17:47:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.