SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
- URL: http://arxiv.org/abs/2506.19585v1
- Date: Tue, 24 Jun 2025 12:51:39 GMT
- Title: SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
- Authors: Gencer Sumbul, Chang Xu, Emanuele Dalsasso, Devis Tuia,
- Abstract summary: Recent deep learning models are often specific to single sensors or to fixed combinations.<n>We introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts.<n>On both single and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.
- Score: 17.304106221330414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models, whether task-specific or foundational, are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining. Our code and pretrained models are available at https://gsumbul.github.io/SMARTIES.
Related papers
- FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification [10.878168590232852]
We decompose, align, and fuse multiple image sensor data for target classification.<n>We propose a shared discrete token (UDT) space between sensors to reduce the domain and granularity gaps.<n>We achieve superior classification performance compared to single-modality classifiers.
arXiv Detail & Related papers (2025-03-12T22:12:35Z) - Sensor-Invariant Tactile Representation [11.153753622913843]
High-resolution tactile sensors have become critical for embodied perception and robotic manipulation.<n>A key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations.<n>We introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors.
arXiv Detail & Related papers (2025-02-27T00:12:50Z) - MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models.
We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z) - CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining [1.4528189330418977]
SenPa-MAE encodes the sensor parameters of an observed multispectral signal into the image embeddings.
SenPa-MAE can be pre-trained on imagery of different satellites with non-matching spectral or geometrical sensor characteristics.
arXiv Detail & Related papers (2024-08-20T16:53:30Z) - Bridging Remote Sensors with Multisensor Geospatial Foundation Models [15.289711240431107]
msGFM is a multisensor geospatial foundation model that unifies data from four key sensor modalities.
For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach.
msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks.
arXiv Detail & Related papers (2024-04-01T17:30:56Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors.
S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration.
We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.