Related papers: SeaMo: A Season-Aware Multimodal Foundation Model for Remote Sensing

SeaMo: A Season-Aware Multimodal Foundation Model for Remote Sensing

URL: http://arxiv.org/abs/2412.19237v2
Date: Sun, 20 Apr 2025 00:08:58 GMT
Title: SeaMo: A Season-Aware Multimodal Foundation Model for Remote Sensing
Authors: Xuyang Li, Chenyu Li, Gemine Vivone, Danfeng Hong,
Abstract summary: Remote Sensing (RS) data encapsulates rich multi-dimensional information essential for Earth observation.<n>Existing Visual Foundation Models (VFMs) serve as powerful feature extractors, leveraging extensive RS data for pretraining and subsequent fine-tuning.<n>We introduce SeaMo, a novel VFM that effectively integrates multimodal and multi-seasonal RS information.
Score: 26.830180880225566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote Sensing (RS) data encapsulates rich multi-dimensional information essential for Earth observation. Its vast volume, diverse sources, and temporal continuity make it particularly well-suited for developing large Visual Foundation Models (VFMs). These models serve as powerful feature extractors, leveraging extensive RS data for pretraining and subsequent fine-tuning in various geoscientific applications. However, existing VFMs in the RS domain often concentrate on specific image characteristics, neglecting the full season-aware potential of RS data. To bridge this gap, we introduce SeaMo, a novel VFM that effectively integrates multimodal and multi-seasonal RS information. SeaMo leverages a masked image modeling framework to fully exploit the spatial, spectral, and seasonal dimensions of RS data. Specifically, we employ unaligned spatial region selection to capture spatial heterogeneity, incorporate multi-source inputs for enhanced multimodal integration, and introduce temporal-multimodal fusion blocks to assimilate seasonal variations effectively. By explicitly modeling the complex, season-dependent attributes of RS data, SeaMo enhances generalization, robustness, and adaptability across geoscientific tasks. Extensive experiments and ablation studies demonstrate its superior performance, underscoring its potential as a foundational model for Earth observation.

Related papers

Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain [58.46450049579116]
We propose a knowledge-guided complex diffusion model for PolSAR image classification in the Contourlet domain.<n> Specifically, the Contourlet transform is first applied to decompose the data into low- and high-frequency subbands.<n>A knowledge-guided complex diffusion network is then designed to model the statistical properties of the low-frequency components.
arXiv Detail & Related papers (2025-07-08T04:50:28Z)
PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z)
Multi-Scale and Multimodal Species Distribution Modeling [4.022195138381868]
Species distribution models (SDMs) aim to predict the distribution of species relating occurrence data with environmental variables. Recent applications of deep learning to SDMs have enabled new avenues, specifically the inclusion of spatial data. We develop a modular structure for SDMs that allows us to test the effect of scale in both single- and multi-scale settings. Results on the GeoLifeCLEF 2023 benchmark indicate that considering multimodal data and learning multi-scale representations leads to more accurate models.
arXiv Detail & Related papers (2024-11-06T15:57:20Z)
Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs) It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts. We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z)
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation [8.443065903814821]
This study introduces a novel Multimodal Adapter-based Network (MANet) for multimodal remote sensing semantic segmentation. At the core of this approach is the development of a Multimodal Adapter (MMAdapter), which fine-tunes SAM's image encoder to effectively leverage the model's general knowledge for multimodal data. This work not only introduces a novel network for multimodal fusion, but also demonstrates, for the first time, SAM's powerful generalization capabilities with Digital Surface Model (DSM) data.
arXiv Detail & Related papers (2024-10-15T00:52:16Z)
RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks [11.681342476516267]
We propose a Remote Distributed Sensing Foundation Model (RS-DFM) based on generalized information mapping and interaction. This model can realize online collaborative perception across multiple platforms and various downstream tasks. We present a dual-branch information compression module to decouple high-frequency and low-frequency feature information.
arXiv Detail & Related papers (2024-06-11T07:46:47Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Spatial Attention-based Distribution Integration Network for Human Pose Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization. Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM) Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z)
Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets [40.19690479537335]
We show that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks. This shows that DA-GNNs are a promising method for extracting domain-independent cosmological information.
arXiv Detail & Related papers (2023-11-02T20:40:21Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model [36.993630058695345]
We propose a shared and specific feature learning (S2FL) model to decomposing multimodal RS data into modality-shared and modality-specific components. To better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification.
arXiv Detail & Related papers (2021-05-21T08:14:21Z)
MTS-CycleGAN: An Adversarial-based Deep Mapping Learning Network for Multivariate Time Series Domain Adaptation Applied to the Ironmaking Industry [0.0]
This research focuses on translating the specific asset-based historical data (source domain) into data corresponding to one reference asset (target domain) We propose MTS-CycleGAN, an algorithm for Multivariate Time Series data based on CycleGAN. Our contribution is the integration in the CycleGAN architecture of a Long Short-Term Memory (LSTM)-based AutoEncoder (AE) for the generator and a stacked LSTM-based discriminator.
arXiv Detail & Related papers (2020-07-15T07:33:25Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations. Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.