Related papers: SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

URL: http://arxiv.org/abs/2512.11215v1
Date: Fri, 12 Dec 2025 01:47:28 GMT
Title: SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
Authors: Tianye Qi, Weihao Li, Nick Barnes,
Abstract summary: SmokeBench is a benchmark to evaluate the ability of multimodal large language models (MLLMs) to recognize and localize wildfire smoke in images.<n>We evaluate several MLLMs, including Idefics2, Qwen2.5-VL, InternVL3, Unified-IO 2, Grounding DINO, GPT-4o, and Gemini-2.5 Pro.<n>Smoke volume is strongly correlated with model performance, whereas contrast plays a comparatively minor role.
Score: 19.134309978060134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Wildfire smoke is transparent, amorphous, and often visually confounded with clouds, making early-stage detection particularly challenging. In this work, we introduce a benchmark, called SmokeBench, to evaluate the ability of multimodal large language models (MLLMs) to recognize and localize wildfire smoke in images. The benchmark consists of four tasks: (1) smoke classification, (2) tile-based smoke localization, (3) grid-based smoke localization, and (4) smoke detection. We evaluate several MLLMs, including Idefics2, Qwen2.5-VL, InternVL3, Unified-IO 2, Grounding DINO, GPT-4o, and Gemini-2.5 Pro. Our results show that while some models can classify the presence of smoke when it covers a large area, all models struggle with accurate localization, especially in the early stages. Further analysis reveals that smoke volume is strongly correlated with model performance, whereas contrast plays a comparatively minor role. These findings highlight critical limitations of current MLLMs for safety-critical wildfire monitoring and underscore the need for methods that improve early-stage smoke localization.

Related papers

WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery [1.0907929392898186]
Wildfires are a growing threat to ecosystems, human lives, and infrastructure.<n>We introduce WildfireVLM, an AI framework that combines satellite imagery wildfire detection with language-driven risk assessment.
arXiv Detail & Related papers (2026-02-09T19:40:50Z)
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning [62.09195763860549]
Reinforcement learning with verifiable rewards (RLVR) improves reasoning in large language models (LLMs) but struggles with exploration.<n>We introduce $textbfVOGUE (Visual Uncertainty Guided Exploration)$, a novel method that shifts exploration from the output (text) to the input (visual) space.<n>Our work shows that grounding exploration in the inherent uncertainty of visual inputs is an effective strategy for improving multimodal reasoning.
arXiv Detail & Related papers (2025-10-01T20:32:08Z)
MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection [6.307649189539342]
Smoke is the first visible indicator of a wildfire.<n>Current inpainting models exhibit limitations in generating high-quality smoke representations.<n>We propose a comprehensive framework for generating forest fire smoke images.
arXiv Detail & Related papers (2025-07-15T12:25:35Z)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
Adversarial Robustness for Deep Learning-based Wildfire Prediction Models [3.4528046839403905]
We introduce WARP (Wildfire Adversarial Robustness Procedure), the first model-agnostic framework for evaluating wildfire detection models' robustness.<n> WARP addresses inherent limitations in data diversity by generating adversarial examples through image-global and -local perturbations.<n>Using WARP, we assessed real-time CNNs and Transformers, uncovering key vulnerabilities.
arXiv Detail & Related papers (2024-12-28T04:06:29Z)
FoSp: Focus and Separation Network for Early Smoke Segmentation [0.6165605009782557]
Early smoke segmentation (ESS) enables the accurate identification of smoke sources, facilitating the prompt extinguishing of fires and preventing large-scale gas leaks. ESS poses greater challenges than conventional object and regular smoke segmentation due to its small scale and transparent appearance. We introduce a high-quality real-world dataset called SmokeSeg, which contains more small and transparent smoke than the existing datasets.
arXiv Detail & Related papers (2023-06-07T14:45:24Z)
Multimodal Wildland Fire Smoke Detection [5.15911752972989]
Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the U.S. We present our work on integrating multiple data sources in SmokeyNet, a deep learning model usingtemporal information to detect smoke from wildland fires. With a time-to-detection of only a few minutes, SmokeyNet can serve as an automated early notification system, providing a useful tool in the fight against destructive wildfires.
arXiv Detail & Related papers (2022-12-29T01:16:06Z)
Image-Based Fire Detection in Industrial Environments with YOLOv4 [53.180678723280145]
This work looks into the potential of AI to detect and recognize fires and reduce detection time using object detection on an image stream. To our end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector.
arXiv Detail & Related papers (2022-12-09T11:32:36Z)
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models [54.21757555804668]
We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models. F-VLM simplifies the current multi-stage training pipeline by eliminating the need for knowledge distillation or detection-tailored pretraining.
arXiv Detail & Related papers (2022-09-30T17:59:52Z)
FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection [0.0]
Fire Ignition Library (FIgLib) is a publicly-available dataset of nearly 25,000 labeled wildfire smoke images. SmokeyNet is a novel deep learning architecture usingtemporal information from camera imagery for real-time wildfire smoke detection. When trained on the FIgLib dataset, SmokeyNet outperforms comparable baselines and rivals human performance.
arXiv Detail & Related papers (2021-12-16T03:49:58Z)
City-scale Scene Change Detection using Point Clouds [71.73273007900717]
We propose a method for detecting structural changes in a city using images captured from mounted cameras over two different times. A direct comparison of the two point clouds for change detection is not ideal due to inaccurate geo-location information. To circumvent this problem, we propose a deep learning-based non-rigid registration on the point clouds. Experiments show that our method is able to detect scene changes effectively, even in the presence of viewpoint and illumination differences.
arXiv Detail & Related papers (2021-03-26T08:04:13Z)
STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection [52.648906951532155]
We propose a novel Spatio-Temporal Cross Network (STCNet) to recognize industrial smoke emissions. The proposed STCNet involves a spatial to extract texture features and a temporal pathway to capture smoke motion information. We show that our STCNet achieves clear improvements on the challenging RISE industrial smoke detection dataset against the best competitors by 6.2%.
arXiv Detail & Related papers (2020-11-10T02:28:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.