FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
- URL: http://arxiv.org/abs/2503.13814v1
- Date: Tue, 18 Mar 2025 01:45:51 GMT
- Title: FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
- Authors: Jinping Wang, Weiwei Song, Hao Chen, Jinchang Ren, Huimin Zhao,
- Abstract summary: This paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer)<n>The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge.<n> Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer.
- Score: 7.523866920738647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, \emph{i.e.}, hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer. The corresponding code will be released at https://github.com/Cimy-wang/FusDreamer.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - SpecSAR-Former: A Lightweight Transformer-based Network for Global LULC Mapping Using Integrated Sentinel-1 and Sentinel-2 [13.17346252861919]
We introduce the Dynamic World+ dataset, expanding the current authoritative multispectral dataset, Dynamic World.
To facilitate the combination of multispectral and SAR data, we propose a lightweight transformer architecture termed SpecSAR-Former.
Our network outperforms existing transformer and CNN-based models, achieving a mean Intersection over Union (mIoU) of 59.58%, an Overall Accuracy (OA) of 79.48%, and an F1 Score of 71.68% with only 26.70M parameters.
arXiv Detail & Related papers (2024-10-04T22:53:25Z) - Federated Multi-Agent Mapping for Planetary Exploration [0.4143603294943439]
We propose a federated multi-agent mapping approach that jointly trains a global map model across agents without transmitting raw data.<n>Our method leverages implicit neural mapping to generate parsimonious, adaptable representations, reducing data transmission by up to 93.8% compared to raw maps.<n>We demonstrate the efficacy of our approach on Martian terrains and glacier datasets, achieving downstream path planning F1 scores as high as 0.95 while outperforming on map reconstruction losses.
arXiv Detail & Related papers (2024-04-02T20:32:32Z) - MC-DBN: A Deep Belief Network-Based Model for Modality Completion [3.7020486533725605]
We propose a Modality Completion Deep Belief Network-Based Model (MC-DBN)
This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data.
It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model.
arXiv Detail & Related papers (2024-02-15T08:21:50Z) - FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and
Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff.
Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder.
Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z) - Reformulating Vision-Language Foundation Models and Datasets Towards
Universal Multimodal Assistants [65.47222691674074]
Muffin framework employs pre-trained vision-language models to act as providers of visual signals.
UniMM-Chat dataset explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions.
arXiv Detail & Related papers (2023-10-01T12:35:18Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Multi-Source Transfer Learning for Deep Model-Based Reinforcement
Learning [0.6445605125467572]
A crucial challenge in reinforcement learning is to reduce the number of interactions with the environment that an agent requires to master a given task.
Transfer learning proposes to address this issue by re-using knowledge from previously learned tasks.
The goal of this paper is to address these issues with modular multi-source transfer learning techniques.
arXiv Detail & Related papers (2022-05-28T12:04:52Z) - Analysis and Optimal Edge Assignment For Hierarchical Federated Learning
on Non-IID Data [43.32085029569374]
Federated learning algorithms aim to leverage distributed and diverse data stored at users' devices to learn a global phenomena.
In the cases where the participants' data are strongly skewed (i.e., non-IID), the local models can overfit local data, leading to low performing global model.
We propose a hierarchical learning system that performs Federated Gradient Descent on the user-edge layer and Federated Averaging on the edge-cloud layer.
arXiv Detail & Related papers (2020-12-10T12:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.