CLOSP: A Unified Semantic Space for SAR, MSI, and Text in Remote Sensing
- URL: http://arxiv.org/abs/2507.10403v2
- Date: Wed, 24 Sep 2025 14:10:23 GMT
- Title: CLOSP: A Unified Semantic Space for SAR, MSI, and Text in Remote Sensing
- Authors: Daniele Rege Cambrin, Lorenzo Vaiani, Giuseppe Gallipoli, Luca Cagliero, Paolo Garza,
- Abstract summary: We introduce CrisisLandMark, a new large-scale corpus of over 647,000 Sentinel-1 SAR and Sentinel-2 multispectral images.<n>We then present CLOSP, a novel framework that uses text as a bridge to align unpaired optical and SAR images into a unified embedding space.<n>Our experiments show that CLOSP achieves a new state-of-the-art, improving retrieval nDGC@1000 by 54% over existing models.
- Score: 20.32829825711161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieving relevant imagery from vast satellite archives is crucial for applications like disaster response and long-term climate monitoring. However, most text-to-image retrieval systems are limited to RGB data, failing to exploit the unique physical information captured by other sensors, such as the all-weather structural sensitivity of Synthetic Aperture Radar (SAR) or the spectral signatures in optical multispectral data. To bridge this gap, we introduce CrisisLandMark, a new large-scale corpus of over 647,000 Sentinel-1 SAR and Sentinel-2 multispectral images paired with structured textual annotations for land cover, land use, and crisis events harmonized from authoritative land cover systems (CORINE and Dynamic World) and crisis-specific sources. We then present CLOSP (Contrastive Language Optical SAR Pretraining), a novel framework that uses text as a bridge to align unpaired optical and SAR images into a unified embedding space. Our experiments show that CLOSP achieves a new state-of-the-art, improving retrieval nDGC@1000 by 54% over existing models. Additionally, we find that the unified training strategy overcomes the inherent difficulty of interpreting SAR imagery by transferring rich semantic knowledge from the optical domain with indirect interaction. Furthermore, GeoCLOSP, which integrates geographic coordinates into our framework, creates a powerful trade-off between generality and specificity: while the CLOSP excels at general semantic tasks, the GeoCLOSP becomes a specialized expert for retrieving location-dependent crisis events and rare geographic features. This work highlights that the integration of diverse sensor data and geographic context is essential for unlocking the full potential of remote sensing archives.
Related papers
- FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery [8.62554606349568]
FUSAR-GPT is a VLM specifically for Synthetic Aperture Radar (SAR) applications.<n>It embeds multi-source remote-sensing temporal features into the model's visual backbone via'spatiotemporal anchors'<n>It achieves state-of-the-art performance across several typical remote sensing visual-language benchmark tests.
arXiv Detail & Related papers (2026-02-22T13:40:17Z) - Persistent feature reconstruction of resident space objects (RSOs) within inverse synthetic aperture radar (ISAR) images [0.0]
This work focuses on recognition of external structures by use of sequential feature detection and tracking.<n>ISAR imagery is generated via a metaheuristic simulator capable of modelling encounters for a variety of deployment scenarios.<n>It is shown that the use of feature tracking within sequences with the proposed approach will increase confidence in feature detection and classification.
arXiv Detail & Related papers (2025-12-17T17:24:50Z) - DescribeEarth: Describe Anything for Remote Sensing Images [56.04533626223295]
We propose Geo-DLC, a novel task of object-level fine-grained image captioning for remote sensing.<n>To support this task, we construct DE-Dataset, a large-scale dataset with detailed descriptions of object attributes, relationships, and contexts.<n>We also present DescribeEarth, a Multi-modal Large Language Model architecture explicitly designed for Geo-DLC.
arXiv Detail & Related papers (2025-09-30T01:53:34Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake [1.6070833439280312]
Building damage identification shortly after a disaster is crucial for guiding emergency response and recovery efforts.<n>We introduce a novel multimodal deep learning (DL) framework for detecting building damage using single-date very high resolution (VHR) Synthetic Aperture Radar (SAR) imagery.<n>Our method integrates SAR image patches, OpenStreetMap (OSM) building footprints, digital surface model (DSM) data, and structural and exposure attributes from the Global Earthquake Model (GEM)<n>Results highlight that incorporating geospatial features significantly enhances detection performance and generalizability to previously unseen areas.
arXiv Detail & Related papers (2025-06-27T15:49:58Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - RaSCL: Radar to Satellite Crossview Localization [20.34909681483566]
We present a method of registering imaging radar on the ground with overhead RGB imagery, with joint optimization of relative poses from odometry and global poses from our overhead registration.<n>Our work presents insights on extracting essential features from RGB overhead images for effective global localization against overhead imagery using only ground radar and a single georeferenced initial guess.
arXiv Detail & Related papers (2025-04-22T13:41:04Z) - GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis [17.83602731408318]
We introduce GAIA, a novel dataset for multi-scale, multi-sensor, and multi-modal Remote Sensing (RS) image analysis.<n>GAIA comprises of 205,150 meticulously curated RS image-text pairs, representing a diverse range of RS modalities associated to different spatial resolutions.<n>GAIA significantly improves performance on RS image classification, cross-modal retrieval and image captioning tasks.
arXiv Detail & Related papers (2025-02-13T18:52:14Z) - Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation [69.01029651113386]
Embodied-RAG is a framework that enhances the model of an embodied agent with a non-parametric memory system.<n>At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail.<n>We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries.
arXiv Detail & Related papers (2024-09-26T21:44:11Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - FLOGA: A machine learning ready dataset, a benchmark and a novel deep
learning model for burnt area mapping with Sentinel-2 [41.28284355136163]
Wildfires pose significant threats to human and animal lives, ecosystems, and socio-economic stability.
In this work, we create and introduce a machine-learning ready dataset we name FLOGA (Forest wiLdfire Observations for the Greek Area)
This dataset is unique as it comprises of satellite imagery acquired before and after a wildfire event.
We use FLOGA to provide a thorough comparison of multiple Machine Learning and Deep Learning algorithms for the automatic extraction of burnt areas.
arXiv Detail & Related papers (2023-11-06T18:42:05Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias.
We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z) - PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person
Search [56.02761592710612]
We propose a novel attention-aware relation mixer (ARM) for module person search.
Our ARM module is native and does not rely on fine-grained supervision or topological assumptions.
Our PS-ARM achieves state-of-the-art performance on both datasets.
arXiv Detail & Related papers (2022-10-07T10:04:12Z) - SPARK: SPAcecraft Recognition leveraging Knowledge of Space Environment [10.068428438297563]
This paper proposes the SPARK dataset as a new unique space object multi-modal image dataset.
The SPARK dataset has been generated under a realistic space simulation environment.
It provides about 150k images per modality, RGB and depth, and 11 classes for spacecrafts and debris.
arXiv Detail & Related papers (2021-04-13T07:16:55Z) - Dense Attention Fluid Network for Salient Object Detection in Optical
Remote Sensing Images [193.77450545067967]
We propose an end-to-end Dense Attention Fluid Network (DAFNet) for salient object detection in optical remote sensing images (RSIs)
A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships.
We construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations.
arXiv Detail & Related papers (2020-11-26T06:14:10Z) - DFPENet-geology: A Deep Learning Framework for High Precision
Recognition and Segmentation of Co-seismic Landslides [7.927831418004974]
This paper develops a robust model, Dense Feature Pyramid with Dense-decoder Network (DFPENet) to understand and fuse the multi-scale features of objects in remote sensing images.
A comprehensive and widely-used scheme is proposed for co-seismic landslide recognition, which integrates image features extracted from the DFPENet model, geologic features, temporal resolution, landslide spatial analysis, and transfer learning.
arXiv Detail & Related papers (2019-08-28T19:07:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.