Related papers: Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification

Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification

URL: http://arxiv.org/abs/2509.23310v1
Date: Sat, 27 Sep 2025 13:55:32 GMT
Title: Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification
Authors: Hao Liu, Yongjie Zheng, Yuhan Kang, Mingyang Zhang, Maoguo Gong, Lorenzo Bruzzone,
Abstract summary: This paper proposes a balanced diffusion-guided fusion framework to guide a multi-branch network for land-cover classification.<n>Experiments on four multimodal remote sensing datasets demonstrate that the proposed method achieves superior classification performance.
Score: 43.05726181699589
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning-based techniques for the analysis of multimodal remote sensing data have become popular due to their ability to effectively integrate complementary spatial, spectral, and structural information from different sensors. Recently, denoising diffusion probabilistic models (DDPMs) have attracted attention in the remote sensing community due to their powerful ability to capture robust and complex spatial-spectral distributions. However, pre-training multimodal DDPMs may result in modality imbalance, and effectively leveraging diffusion features to guide complementary diversity feature extraction remains an open question. To address these issues, this paper proposes a balanced diffusion-guided fusion (BDGF) framework that leverages multimodal diffusion features to guide a multi-branch network for land-cover classification. Specifically, we propose an adaptive modality masking strategy to encourage the DDPMs to obtain a modality-balanced rather than spectral image-dominated data distribution. Subsequently, these diffusion features hierarchically guide feature extraction among CNN, Mamba, and transformer networks by integrating feature fusion, group channel attention, and cross-attention mechanisms. Finally, a mutual learning strategy is developed to enhance inter-branch collaboration by aligning the probability entropy and feature similarity of individual subnetworks. Extensive experiments on four multimodal remote sensing datasets demonstrate that the proposed method achieves superior classification performance. The code is available at https://github.com/HaoLiu-XDU/BDGF.

Related papers

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process [55.91649771370862]
Dirichlet process (DP) mixture model is a powerful non-parametric method that can amplify the most prominent features.<n>We propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal representation learning and cross-modal alignment.
arXiv Detail & Related papers (2025-10-23T16:53:24Z)
Heterogeneous Multi-agent Collaboration in UAV-assisted Mobile Crowdsensing Networks [6.226837215382989]
Unmanned aerial vehicles (UAVs)-assisted mobile crowdsensing (MCS) has emerged as a promising paradigm for data collection.<n>We tackle challenges such as spectrum scarcity, device computation, and user mobility issues that hinder efficient coordination of sensing, communication, and resource allocation.
arXiv Detail & Related papers (2025-09-28T02:13:19Z)
Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions.<n>Existing approaches focus on single-source domain generalization to unseen target domains.<n>We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z)
Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently. By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z)
AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data. This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z)
Federated Multi-Agent Mapping for Planetary Exploration [0.4143603294943439]
We propose a federated multi-agent mapping approach that jointly trains a global map model across agents without transmitting raw data.<n>Our method leverages implicit neural mapping to generate parsimonious, adaptable representations, reducing data transmission by up to 93.8% compared to raw maps.<n>We demonstrate the efficacy of our approach on Martian terrains and glacier datasets, achieving downstream path planning F1 scores as high as 0.95 while outperforming on map reconstruction losses.
arXiv Detail & Related papers (2024-04-02T20:32:32Z)
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z)
FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z)
Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access [56.91063444859008]
We investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD. Our results demonstrate that optimizing the access probability to maximize the expected number of successful links is a highly effective strategy for accelerating the system convergence.
arXiv Detail & Related papers (2023-05-12T10:32:26Z)
Diffusion Mechanism in Residual Neural Network: Theory and Applications [12.573746641284849]
In many learning tasks with limited training samples, the diffusion connects the labeled and unlabeled data points. We propose a novel diffusion residual network (Diff-ResNet) internally introduces diffusion into the architectures of neural networks. Under the structured data assumption, it is proved that the proposed diffusion block can increase the distance-diameter ratio that improves the separability of inter-class points.
arXiv Detail & Related papers (2021-05-07T10:42:59Z)
Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
This paper proposes a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network.
arXiv Detail & Related papers (2020-11-10T09:53:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.