Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
- URL: http://arxiv.org/abs/2512.17224v1
- Date: Fri, 19 Dec 2025 04:21:01 GMT
- Title: Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
- Authors: Xuyang Li, Chenyu Li, Danfeng Hong,
- Abstract summary: We propose Any Optical Model (AOM) to accommodate arbitrary band compositions, sensor types, and resolution scales.<n>AOM consistently achieves state-of-the-art (SOTA) performance under challenging conditions such as band missing, cross sensor, and cross resolution settings.
- Score: 24.03278912134978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical satellites, with their diverse band layouts and ground sampling distances, supply indispensable evidence for tasks ranging from ecosystem surveillance to emergency response. However, significant discrepancies in band composition and spatial resolution across different optical sensors present major challenges for existing Remote Sensing Foundation Models (RSFMs). These models are typically pretrained on fixed band configurations and resolutions, making them vulnerable to real world scenarios involving missing bands, cross sensor fusion, and unseen spatial scales, thereby limiting their generalization and practical deployment. To address these limitations, we propose Any Optical Model (AOM), a universal RSFM explicitly designed to accommodate arbitrary band compositions, sensor types, and resolution scales. To preserve distinctive spectral characteristics even when bands are missing or newly introduced, AOM introduces a spectrum-independent tokenizer that assigns each channel a dedicated band embedding, enabling explicit encoding of spectral identity. To effectively capture texture and contextual patterns from sub-meter to hundred-meter imagery, we design a multi-scale adaptive patch embedding mechanism that dynamically modulates the receptive field. Furthermore, to maintain global semantic consistency across varying resolutions, AOM incorporates a multi-scale semantic alignment mechanism alongside a channel-wise self-supervised masking and reconstruction pretraining strategy that jointly models spectral-spatial relationships. Extensive experiments on over 10 public datasets, including those from Sentinel-2, Landsat, and HLS, demonstrate that AOM consistently achieves state-of-the-art (SOTA) performance under challenging conditions such as band missing, cross sensor, and cross resolution settings.
Related papers
- Universal Pansharpening Foundation Model [67.10467574892282]
Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image.<n>We present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion.
arXiv Detail & Related papers (2026-03-04T08:30:15Z) - SONIC: Spectral Oriented Neural Invariant Convolutions [0.0]
Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches.<n>ViTs provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size.<n>We introduce SONIC, a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components.
arXiv Detail & Related papers (2026-01-27T18:51:11Z) - SKANet: A Cognitive Dual-Stream Framework with Adaptive Modality Fusion for Robust Compound GNSS Interference Classification [47.20483076887704]
Global Navigation Satellite Systems (GNSS) face growing threats from sophisticated jamming interference.<n>We propose a cognitive deep learning framework built upon a dual-stream architecture that integrates Time-Frequency Images (TFIs) and Power Spectral Density (PSD)<n>We show that SKANet achieves an overall accuracy of 96.99%, exhibiting superior robustness for compound jamming classification.
arXiv Detail & Related papers (2026-01-19T07:42:45Z) - RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation [12.826798868837557]
RAMEN is a resolution-adjustable multimodal encoder that learns a shared visual representation across EO data.<n>We train a single, unified transformer encoder reconstructing masked multimodal EO data drawn from diverse sources.<n> RAMEN outperforms larger state-of-the-art models on the community-standard PANGAEA benchmark.
arXiv Detail & Related papers (2025-12-04T17:40:17Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - Atomizer: Generalizing to new modalities by breaking satellite images down to a set of scalars [9.925465775310181]
Existing models rely on fixed input formats and modality-specific encoders, which require retraining when new configurations are introduced.<n>We introduce Atomizer, a flexible architecture that represents remote sensing images as sets of tokens, each corresponding to a spectral band value of a pixel.<n>Atomizer outperforms standard models and demonstrates robust performance across varying resolutions and spatial sizes.
arXiv Detail & Related papers (2025-06-16T14:30:37Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [49.81255045696323]
We present the Auxiliary Metadata Driven Infrared Small Target Detector (AuxDet)<n>AuxDet integrates metadata semantics with visual features, guiding adaptive representation learning for each sample.<n>Experiments on the challenging WideIRSTD-Full benchmark demonstrate that AuxDet consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [69.02751635551724]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding.<n> variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies.<n>We introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - RRNet: Relational Reasoning Network with Parallel Multi-scale Attention
for Salient Object Detection in Optical Remote Sensing Images [82.1679766706423]
Salient object detection (SOD) for optical remote sensing images (RSIs) aims at locating and extracting visually distinctive objects/regions from the optical RSIs.
We propose a relational reasoning network with parallel multi-scale attention for SOD in optical RSIs.
Our proposed RRNet outperforms the existing state-of-the-art SOD competitors both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-10-27T07:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.