FlexiMo: A Flexible Remote Sensing Foundation Model
- URL: http://arxiv.org/abs/2503.23844v1
- Date: Mon, 31 Mar 2025 08:46:05 GMT
- Title: FlexiMo: A Flexible Remote Sensing Foundation Model
- Authors: Xuyang Li, Chenyu Li, Pedram Ghamisi, Danfeng Hong,
- Abstract summary: FlexiMo is a flexible remote sensing foundation model that endows the pre-trained model with the flexibility to adapt to arbitrary spatial resolutions.<n>Central to FlexiMo is a spatial resolution-aware module that employs a parameter-free alignment embedding mechanism.<n>Experiments on diverse multimodal, multi-resolution, and multi-scale datasets demonstrate that FlexiMo significantly enhances model generalization and robustness.
- Score: 33.027094254412056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid expansion of multi-source satellite imagery drives innovation in Earth observation, opening unprecedented opportunities for Remote Sensing Foundation Models to harness diverse data. However, many existing models remain constrained by fixed spatial resolutions and patch sizes, limiting their ability to fully exploit the heterogeneous spatial characteristics inherent in satellite imagery. To address these challenges, we propose FlexiMo, a flexible remote sensing foundation model that endows the pre-trained model with the flexibility to adapt to arbitrary spatial resolutions. Central to FlexiMo is a spatial resolution-aware module that employs a parameter-free alignment embedding mechanism to dynamically recalibrate patch embeddings based on the input image's resolution and dimensions. This design not only preserves critical token characteristics and ensures multi-scale feature fidelity but also enables efficient feature extraction without requiring modifications to the underlying network architecture. In addition, FlexiMo incorporates a lightweight channel adaptation module that leverages prior spectral information from sensors. This mechanism allows the model to process images with varying numbers of channels while maintaining the data's intrinsic physical properties. Extensive experiments on diverse multimodal, multi-resolution, and multi-scale datasets demonstrate that FlexiMo significantly enhances model generalization and robustness. In particular, our method achieves outstanding performance across a range of downstream tasks, including scene classification, land cover classification, urban building segmentation, and cloud detection. By enabling parameter-efficient and physically consistent adaptation, FlexiMo paves the way for more adaptable and effective foundation models in real-world remote sensing applications.
Related papers
- Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows [46.673228292287895]
We propose a novel framework that employs transformer-based autoregressive normalizing flows to model continuous representations.<n>This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context.<n>We propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data.
arXiv Detail & Related papers (2025-07-01T04:51:25Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - A class of modular and flexible covariate-based covariance functions for nonstationary spatial modeling [0.0]
We present a class of covariance functions that relies on fixed, observable spatial information.
This model allows for separate structures for different sources of nonstationarity, such as marginal standard deviation, geometric anisotropy, and smoothness.
We analyze the capabilities of the presented model through simulation studies and an application to Swiss precipitation data.
arXiv Detail & Related papers (2024-10-22T05:53:25Z) - X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing [14.549639729808717]
Current human sensing primarily depends on cameras and LiDAR, each of which has its own strengths and limitations.<n>Existing multi-modal fusion solutions are typically designed for fixed modality combinations.<n>We propose a modality-invariant foundation model for all modalities, X-Fi, to address this issue.
arXiv Detail & Related papers (2024-10-14T05:23:12Z) - Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling [10.914612535745789]
This paper introduces Motion-oriented Compositional Neural Radiance Fields (MoCo-NeRF)
MoCo-NeRF is a framework designed to perform free-viewpoint rendering of monocular human videos.
arXiv Detail & Related papers (2024-07-16T17:59:01Z) - Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - SuDA: Support-based Domain Adaptation for Sim2Real Motion Capture with Flexible Sensors [12.811669078348489]
Existing flexible sensor-based MoCap methods rely on deep learning and necessitate large and diverse labeled datasets for training.
Thanks to the high-linearity of flexible sensors, we propose a novel Sim2Real Mocap solution based on domain adaptation.
Our solution relies on a novel Support-based Domain Adaptation method, namely SuDA, which aligns the supports of the predictive functions.
arXiv Detail & Related papers (2024-05-25T09:43:33Z) - RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing [19.89130165954241]
Remote sensing image dehazing (RSID) aims to remove nonuniform and physically irregular haze factors for high-quality image restoration.
We propose the first lightweight network on the mamba-based model called RSDhamba in the field of RSID.
arXiv Detail & Related papers (2024-05-16T12:12:07Z) - Learning Modulated Transformation in GANs [69.95217723100413]
We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
arXiv Detail & Related papers (2023-08-29T17:51:22Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference.
We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Learning High-Dimensional Distributions with Latent Neural Fokker-Planck
Kernels [67.81799703916563]
We introduce new techniques to formulate the problem as solving Fokker-Planck equation in a lower-dimensional latent space.
Our proposed model consists of latent-distribution morphing, a generator and a parameterized Fokker-Planck kernel function.
arXiv Detail & Related papers (2021-05-10T17:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.