FedFusion: Manifold Driven Federated Learning for Multi-satellite and
Multi-modality Fusion
- URL: http://arxiv.org/abs/2311.09540v1
- Date: Thu, 16 Nov 2023 03:29:19 GMT
- Title: FedFusion: Manifold Driven Federated Learning for Multi-satellite and
Multi-modality Fusion
- Authors: DaiXun Li, Weiying Xie, Yunsong Li, Leyuan Fang
- Abstract summary: This paper proposes a manifold-driven multi-modality fusion framework, FedFusion, which randomly samples local data on each client to jointly estimate the prominent manifold structure of shallow features of each client.
Considering the physical space limitations of the satellite constellation, we developed a multimodal federated learning module designed specifically for manifold data in a deep latent space.
The proposed framework surpasses existing methods in terms of performance on three multimodal datasets, achieving a classification average accuracy of 94.35$%$ while compressing communication costs by a factor of 4.
- Score: 30.909597853659506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-satellite, multi-modality in-orbit fusion is a challenging task as it
explores the fusion representation of complex high-dimensional data under
limited computational resources. Deep neural networks can reveal the underlying
distribution of multi-modal remote sensing data, but the in-orbit fusion of
multimodal data is more difficult because of the limitations of different
sensor imaging characteristics, especially when the multimodal data follows
non-independent identically distribution (Non-IID) distributions. To address
this problem while maintaining classification performance, this paper proposes
a manifold-driven multi-modality fusion framework, FedFusion, which randomly
samples local data on each client to jointly estimate the prominent manifold
structure of shallow features of each client and explicitly compresses the
feature matrices into a low-rank subspace through cascading and additive
approaches, which is used as the feature input of the subsequent classifier.
Considering the physical space limitations of the satellite constellation, we
developed a multimodal federated learning module designed specifically for
manifold data in a deep latent space. This module achieves iterative updating
of the sub-network parameters of each client through global weighted averaging,
constructing a framework that can represent compact representations of each
client. The proposed framework surpasses existing methods in terms of
performance on three multimodal datasets, achieving a classification average
accuracy of 94.35$\%$ while compressing communication costs by a factor of 4.
Furthermore, extensive numerical evaluations of real-world satellite images
were conducted on the orbiting edge computing architecture based on Jetson TX2
industrial modules, which demonstrated that FedFusion significantly reduced
training time by 48.4 minutes (15.18%) while optimizing accuracy.}
Related papers
- LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing [25.016421338677816]
Current methods often process only two types of data, missing out on the rich information that additional modalities can provide.
We propose a novel textbfLightweight textbfMultimodal data textbfFusion textbfNetwork (LMFNet)
LMFNet accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer.
arXiv Detail & Related papers (2024-04-21T13:29:42Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and
Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff.
Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder.
Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z) - Efficient and Effective Deep Multi-view Subspace Clustering [9.6753782215283]
We propose a novel deep framework, termed Efficient and Effective deep Multi-View Subspace Clustering (E$2$MVSC)
Instead of a parameterized FC layer, we design a Relation-Metric Net that decouples network parameter scale from sample numbers for greater computational efficiency.
E$2$MVSC yields comparable results to existing methods and achieves state-of-the-art performance in various types of multi-view datasets.
arXiv Detail & Related papers (2023-10-15T03:08:25Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - Multi-scale Feature Aggregation for Crowd Counting [84.45773306711747]
We propose a multi-scale feature aggregation network (MSFANet)
MSFANet consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg)
arXiv Detail & Related papers (2022-08-10T10:23:12Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - Multi-modal land cover mapping of remote sensing images using pyramid
attention and gated fusion networks [20.66034058363032]
We propose a new multi-modality network for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU)
PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism.
GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise.
arXiv Detail & Related papers (2021-11-06T10:01:01Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.