Related papers: SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

URL: http://arxiv.org/abs/2506.19585v1
Date: Tue, 24 Jun 2025 12:51:39 GMT
Title: SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
Authors: Gencer Sumbul, Chang Xu, Emanuele Dalsasso, Devis Tuia,
Abstract summary: Recent deep learning models are often specific to single sensors or to fixed combinations.<n>We introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts.<n>On both single and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.
Score: 17.304106221330414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models, whether task-specific or foundational, are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining. Our code and pretrained models are available at https://gsumbul.github.io/SMARTIES.

Related papers

FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification [10.878168590232852]
We decompose, align, and fuse multiple image sensor data for target classification.<n>We propose a shared discrete token (UDT) space between sensors to reduce the domain and granularity gaps.<n>We achieve superior classification performance compared to single-modality classifiers.
arXiv Detail & Related papers (2025-03-12T22:12:35Z)
Sensor-Invariant Tactile Representation [11.153753622913843]
High-resolution tactile sensors have become critical for embodied perception and robotic manipulation.<n>A key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations.<n>We introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors.
arXiv Detail & Related papers (2025-02-27T00:12:50Z)
MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models. We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z)
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z)
SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining [1.4528189330418977]
SenPa-MAE encodes the sensor parameters of an observed multispectral signal into the image embeddings. SenPa-MAE can be pre-trained on imagery of different satellites with non-matching spectral or geometrical sensor characteristics.
arXiv Detail & Related papers (2024-08-20T16:53:30Z)
Bridging Remote Sensors with Multisensor Geospatial Foundation Models [15.289711240431107]
msGFM is a multisensor geospatial foundation model that unifies data from four key sensor modalities. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks.
arXiv Detail & Related papers (2024-04-01T17:30:56Z)
Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics. We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z)
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment. We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one. Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.