Related papers: FusionNet: Physics-Aware Representation Learning for Multi-Spectral and Thermal Data via Trainable Signal-Processing Priors

FusionNet: Physics-Aware Representation Learning for Multi-Spectral and Thermal Data via Trainable Signal-Processing Priors

URL: http://arxiv.org/abs/2512.19504v1
Date: Mon, 22 Dec 2025 15:59:37 GMT
Title: FusionNet: Physics-Aware Representation Learning for Multi-Spectral and Thermal Data via Trainable Signal-Processing Priors
Authors: Georgios Voulgaris,
Abstract summary: This work introduces a physics-aware representation learning framework to model stable signatures of long-term physical processes.<n>The proposed backbone embeds trainable differential signal-processing priors within convolutional layers, combines mixed pooling strategies, and employs wider receptive fields.<n> Systematic ablations show that each architectural component contributes to performance gains, with DGCNN achieving 88.7% accuracy on the SWIR ratio and FusionNet reaching 90.6%.
Score: 5.7532749393107645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning models operating on multi-modal visual signals often rely on inductive biases that are poorly aligned with the physical processes governing signal formation, leading to brittle performance under cross-spectral and real-world conditions. In particular, approaches that prioritise direct thermal cues struggle to capture indirect yet persistent environmental alterations induced by sustained heat emissions. This work introduces a physics-aware representation learning framework that leverages multi-spectral information to model stable signatures of long-term physical processes. Specifically, a geological Short Wave Infrared (SWIR) ratio sensitive to soil property changes is integrated with Thermal Infrared (TIR) data through an intermediate fusion architecture, instantiated as FusionNet. The proposed backbone embeds trainable differential signal-processing priors within convolutional layers, combines mixed pooling strategies, and employs wider receptive fields to enhance robustness across spectral modalities. Systematic ablations show that each architectural component contributes to performance gains, with DGCNN achieving 88.7% accuracy on the SWIR ratio and FusionNet reaching 90.6%, outperforming state-of-the-art baselines across five spectral configurations. Transfer learning experiments further show that ImageNet pretraining degrades TIR performance, highlighting the importance of modality-aware training for cross-spectral learning. Evaluated on real-world data, the results demonstrate that combining physics-aware feature selection with principled deep learning architectures yields robust and generalisable representations, illustrating how first-principles signal modelling can improve multi-spectral learning under challenging conditions.

Related papers

ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling [11.169420448510095]
ThermoSplat is a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling.<n>Experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.
arXiv Detail & Related papers (2026-01-22T12:24:26Z)
Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection [7.407155043542133]
This paper proposes a cross-modal data augmentation framework integrating CycleGAN and YOLOv8.<n>We leverage CycleGAN to perform unpaired image-to-image translation, mapping abundant visible-light PCB images into the infrared domain.<n>We construct a heterogeneous training strategy that fuses generated pseudo-IR data with limited real IR samples to train a lightweight YOLOv8 detector.
arXiv Detail & Related papers (2026-01-01T07:01:47Z)
U-PINet: End-to-End Hierarchical Physics-Informed Learning With Sparse Graph Coupling for 3D EM Scattering Modeling [28.64166932076228]
Electromagnetic (EM) scattering modeling is critical for radar remote sensing.<n>Traditional numerical solvers offer high accuracy, but suffer from scalability issues and substantial computational costs.<n>We propose a U-shaped Physics-Informed Network (U-PINet) to overcome these limitations.
arXiv Detail & Related papers (2025-08-05T12:20:42Z)
CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z)
Efficient Generative Model Training via Embedded Representation Warmup [12.485320863366411]
Generative models face a fundamental challenge: they must simultaneously learn high-level semantic concepts and low-level synthesis details.<n>We propose Embedded Representation Warmup, a principled two-phase training framework.<n>Our framework achieves a 11.5$times$ speedup in 350 epochs to reach FID=1.41 compared to single-phase methods like REPA.
arXiv Detail & Related papers (2025-04-14T12:43:17Z)
Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework [57.994965436344195]
Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity.<n> multimodal sensing-aided beam prediction has gained significant attention, using various sensing data to predict user locations or network conditions.<n>Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets.
arXiv Detail & Related papers (2025-04-07T15:38:25Z)
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities [2.2311172523629637]
This paper presents the first comprehensive survey of deep learning advances in remote sensing STF over the past decade.<n>We establish a taxonomy of deep learning architectures including CNNs, Transformers, Generative Adrial Networks (GANs), diffusion models, and sequence models.<n>We identify five critical challenges: time-space conflicts, generalization across datasets, computational efficiency for large-scale processing, multi-source heterogeneous fusion, and insufficient benchmark diversity.
arXiv Detail & Related papers (2025-04-01T15:30:48Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
Scintillation pulse characterization with spectrum-inspired temporal neural networks: case studies on particle detector signals [1.124958340749622]
We propose a network architecture specially tailored for scintillation pulse characterization based on previous works on time series analysis.<n>We prove our idea in two case studies: (a) simulation data generated with the setting of the LUX dark matter detector, and (b) experimental electrical signals with fast electronics to emulate scintillation variations for the NICA/MPD calorimeter.
arXiv Detail & Related papers (2024-10-09T02:44:53Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
ChiNet: Deep Recurrent Convolutional Learning for Multimodal Spacecraft Pose Estimation [3.964047152162558]
This paper presents an innovative deep learning pipeline which estimates the relative pose of a spacecraft by incorporating the temporal information from a rendezvous sequence. It leverages the performance of long short-term memory (LSTM) units in modelling sequences of data for the processing of features extracted by a convolutional neural network (CNN) backbone. Three distinct training strategies, which follow a coarse-to-fine funnelled approach, are combined to facilitate feature learning and improve end-to-end pose estimation by regression.
arXiv Detail & Related papers (2021-08-23T16:48:58Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.