Related papers: U-Net with Hadamard Transform and DCT Latent Spaces for Next-day Wildfire Spread Prediction

U-Net with Hadamard Transform and DCT Latent Spaces for Next-day Wildfire Spread Prediction

URL: http://arxiv.org/abs/2602.11672v1
Date: Thu, 12 Feb 2026 07:45:53 GMT
Title: U-Net with Hadamard Transform and DCT Latent Spaces for Next-day Wildfire Spread Prediction
Authors: Yingyi Luo, Shuaiang Rong, Adam Watts, Ahmet Enis Cetin,
Abstract summary: We develop a lightweight tool for next-day wildfire spread prediction using multimodal satellite data as input.<n>Deep learning model, which we call Transform Domain Fusion UNet, incorporates trainable Hadamard Transform and Discrete Cosine Transform layers.<n>We show that the proposed latent space fusion model balances accuracy and efficiency under a lightweight setting.
Score: 0.6999740786886536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We developed a lightweight and computationally efficient tool for next-day wildfire spread prediction using multimodal satellite data as input. The deep learning model, which we call Transform Domain Fusion UNet (TD-FusionUNet), incorporates trainable Hadamard Transform and Discrete Cosine Transform layers that apply two-dimensional transforms, enabling the network to capture essential "frequency" components in orthogonalized latent spaces. Additionally, we introduce custom preprocessing techniques, including random margin cropping and a Gaussian mixture model, to enrich the representation of the sparse pre-fire masks and enhance the model's generalization capability. The TD-FusionUNet is evaluated on two datasets which are the Next-Day Wildfire Spread dataset released by Google Research in 2023, and WildfireSpreadTS dataset. Our proposed TD-FusionUNet achieves an F1 score of 0.591 with 370k parameters, outperforming the UNet baseline using ResNet18 as the encoder reported in the WildfireSpreadTS dataset while using substantially fewer parameters. These results show that the proposed latent space fusion model balances accuracy and efficiency under a lightweight setting, making it suitable for real time wildfire prediction applications in resource limited environments.

Related papers

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders [74.72147962028265]
Representation Autoencoders (RAEs) have shown distinct advantages in diffusion modeling on ImageNet.<n>We investigate whether this framework can scale to large-scale, freeform text-to-image (T2I) generation.
arXiv Detail & Related papers (2026-01-22T18:58:16Z)
FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting [41.82363110982653]
We present FireSentry, a provincial-scale multi-modal wildfire dataset characterized by sub-meter spatial and sub-second temporal resolution.<n>FireSentry provides visible and infrared video streams, in-situ environmental measurements, and manually validated fire masks.<n>Building on FireSentry, we establish a comprehensive benchmark encompassing physics-based, data-driven, and generative models.
arXiv Detail & Related papers (2025-12-03T02:02:47Z)
UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations [3.588053519843616]
We propose a parameter-efficient framework that adapts a single ImageNet-pretrained diffusion model to multiple sensing modalities using only target-domain data.<n>UniDiff combines FiLM-based timestep-modality conditioning, parameter-efficient adaptation of approximately 5% of parameters, and pseudo-RGB anchoring to preserve pre-trained representations and prevent catastrophic forgetting.
arXiv Detail & Related papers (2025-11-29T00:57:53Z)
iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion [62.09575122593993]
iGaussian is a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion.<n> Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T&T+DB datasets demonstrate a significant performance improvement over previous methods.
arXiv Detail & Related papers (2025-11-18T05:22:22Z)
Detecting Wildfire Flame and Smoke through Edge Computing using Transfer Learning Enhanced Deep Learning Models [7.222438368943405]
This study underscores Transfer Learning's (TL) significance in boosting the performance of object detectors for identifying wildfire smoke and flames.<n>We explore a two-stage cascaded TL method, utilizing D-Fire or FASDD as initial stage target datasets and AFSE as the subsequent stage.<n>Through fine-tuning, TL significantly enhances detection precision, achieving up to 79.2% mean Average Precision (mAP@0.5), reduces training time, and increases model generalizability.
arXiv Detail & Related papers (2025-01-15T08:04:44Z)
TinyFusion: Diffusion Transformers Learned Shallow [52.96232442322824]
Diffusion Transformers have demonstrated remarkable capabilities in image generation but often come with excessive parameterization.<n>We present TinyFusion, a depth pruning method designed to remove redundant layers from diffusion transformers via end-to-end learning.<n>Experiments with DiT-XL show that TinyFusion can craft a shallow diffusion transformer at less than 7% of the pre-training cost, achieving a 2$times$ speedup with an FID score of 2.86.
arXiv Detail & Related papers (2024-12-02T07:05:39Z)
Dynamic Diffusion Transformer [67.13876021157887]
Diffusion Transformer (DiT) has demonstrated superior performance but suffers from substantial computational costs. We propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation. With 3% additional fine-tuning, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet.
arXiv Detail & Related papers (2024-10-04T14:14:28Z)
FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.<n>Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.<n>In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
Next Day Wildfire Spread: A Machine Learning Data Set to Predict Wildfire Spreading from Remote-Sensing Data [5.814925201882753]
Next Day Wildfire Spread' is a curated data set of historical wildfires aggregating nearly a decade of remote-sensing data across the United States. We implement a convolutional autoencoder that takes advantage of the spatial information of this data to predict wildfire spread. This data set can be used as a benchmark for developing wildfire propagation models based on remote sensing data for a lead time of one day.
arXiv Detail & Related papers (2021-12-04T23:28:44Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.