Related papers: Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection

Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection

URL: http://arxiv.org/abs/2508.16739v1
Date: Fri, 22 Aug 2025 18:27:31 GMT
Title: Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection
Authors: Yanbing Bai, Rui-Yang Ju, Lemeng Zhao, Junjie Hu, Jianchao Bi, Erick Mas, Shunichi Koshimura,
Abstract summary: We propose a framework for real-time wildfire monitoring and fire source detection on UAV platforms.<n>Specifically, in Stage 1, we utilize a policy network to identify and discard redundant video clips.<n>In Stage 2, once the frame is classified as "fire," we employ the improved YOLOv8 model to localize the fire source.
Score: 4.796400107449626
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unmanned Aerial Vehicles (UAVs) have become increasingly important in disaster emergency response by enabling real-time aerial video analysis. Due to the limited computational resources available on UAVs, large models cannot be run independently for real-time analysis. To overcome this challenge, we propose a lightweight and efficient two-stage framework for real-time wildfire monitoring and fire source detection on UAV platforms. Specifically, in Stage 1, we utilize a policy network to identify and discard redundant video clips using frame compression techniques, thereby reducing computational costs. In addition, we introduce a station point mechanism that leverages future frame information within the sequential policy network to improve prediction accuracy. In Stage 2, once the frame is classified as "fire", we employ the improved YOLOv8 model to localize the fire source. We evaluate the Stage 1 method using the FLAME and HMDB51 datasets, and the Stage 2 method using the Fire & Smoke dataset. Experimental results show that our method significantly reduces computational costs while maintaining classification accuracy in Stage 1, and achieves higher detection accuracy with similar inference time in Stage 2 compared to baseline methods.

Related papers

PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery [0.0]
Rapid and accurate wildfire detection is crucial for emergency response and environmental management.<n>In airborne and spaceborne missions, real-time algorithms must distinguish between no fire, active fire, and post-fire conditions.<n>We introduce PyroFocus, a two-stage pipeline that performs fire classification followed by fire radiative power (FRP) regression or segmentation to reduce inference time and computational cost for onboard deployment.
arXiv Detail & Related papers (2025-12-02T21:59:45Z)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [70.4360995984905]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
Detecting Wildfire Flame and Smoke through Edge Computing using Transfer Learning Enhanced Deep Learning Models [7.222438368943405]
This study underscores Transfer Learning's (TL) significance in boosting the performance of object detectors for identifying wildfire smoke and flames.<n>We explore a two-stage cascaded TL method, utilizing D-Fire or FASDD as initial stage target datasets and AFSE as the subsequent stage.<n>Through fine-tuning, TL significantly enhances detection precision, achieving up to 79.2% mean Average Precision (mAP@0.5), reduces training time, and increases model generalizability.
arXiv Detail & Related papers (2025-01-15T08:04:44Z)
Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images. textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z)
Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring [4.303063757163241]
This study recognizes the imperative for real-time data processing in disaster response scenarios and introduces a lightweight and efficient approach for aerial video understanding. Our methodology identifies redundant portions within the video through policy networks and eliminates this excess information using frame compression techniques. Compared to the baseline, our approach reduces computation costs by more than 13 times while boosting accuracy by 3$%$.
arXiv Detail & Related papers (2024-08-31T17:26:53Z)
Efficient One-stage Video Object Detection by Exploiting Temporal Consistency [35.16197118579414]
One-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for video object detection. We present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD.
arXiv Detail & Related papers (2024-02-14T15:32:07Z)
Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone Networks [6.313148708539912]
wildfire detection in remote and forest areas is crucial for minimizing devastation and preserving ecosystems. Drones offer agile access to remote, challenging terrains, equipped with advanced imaging technology. limited computation and battery resources pose challenges in implementing and efficient image classification models. This paper aims to develop a real-time image classification and fire segmentation model.
arXiv Detail & Related papers (2024-01-16T04:16:46Z)
Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV. Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z)
Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems [0.799536002595393]
This research paper addresses the challenge of detecting obscured wildfires in real-time using drones equipped only with RGB cameras. We propose a novel methodology that employs semantic segmentation based on the temporal analysis of smoke patterns in video sequences.
arXiv Detail & Related papers (2023-06-30T19:45:43Z)
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. We build a unified framework for efficient end-to-end temporal action detection (ETAD) ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.