GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
- URL: http://arxiv.org/abs/2511.00598v1
- Date: Sat, 01 Nov 2025 15:40:34 GMT
- Title: GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
- Authors: Zixuan Sun, Shuaifeng Zhi, Ruize Li, Jingyuan Xia, Yongxiang Liu, Weidong Jiang,
- Abstract summary: We propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions.<n>First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module.<n>We then implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field.
- Score: 24.22541638346487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly challenging because of their modal discrepancy, primarily manifested as severe nonlinear radiometric differences (NRD), geometric distortions, and noise variations. Under large geometric transformations, existing classical template-based and sparse keypoint-based strategies struggle to achieve reliable registration results for optical-SAR image pairs. To address these limitations, we propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions. First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module, upon which a multi-scale 4D correlation volume is constructed and iteratively refined to establish pixel-wise dense correspondences. Subsequently, we implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field. Such geometry guidance mitigates prediction divergence by directly imposing an estimated affine transformation on the final flow predictions. Extensive experiments have been conducted on three representative datasets WHU-Opt-SAR dataset, OS dataset, and UBCv2 dataset with different spatial resolutions, demonstrating robust performance of our proposed method across different imaging resolutions. Qualitative and quantitative results show that GDROS significantly outperforms current state-of-the-art methods in all metrics. Our source code will be released at: https://github.com/Zi-Xuan-Sun/GDROS.
Related papers
- GeoDiff-SAR: A Geometric Prior Guided Diffusion Model for SAR Image Generation [8.561613404715237]
Existing generative methods primarily operate within the image domain, neglecting explicit geometric information.<n>We propose GeoDiff-SAR, a geometric prior guided diffusion model for high-fidelity SAR image generation.<n>Results demonstrate that data generated by GeoDiff-SAR exhibits high fidelity and effectively enhances the accuracy of downstream classification tasks.
arXiv Detail & Related papers (2026-01-07T01:27:20Z) - Grid-Reg: Detector-Free Gridized Feature Learning and Matching for Large-Scale SAR-Optical Image Registration [22.80821597640134]
It is highly challenging to register large-scale, heterogeneous SAR and optical images, particularly across platforms.<n>To overcome these challenges, we propose Grid-Reg, a grid-based multimodal registration framework.<n>Our proposed approach achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-07-06T03:43:18Z) - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z) - Multi-Resolution SAR and Optical Remote Sensing Image Registration Methods: A Review, Datasets, and Future Perspectives [13.749888089968373]
Synthetic Aperture Radar (SAR) and optical image registration is essential for remote sensing data fusion.<n>As image resolution increases, fine SAR textures become more significant, leading to alignment issues and 3D spatial discrepancies.<n>The MultiResSAR dataset was created, containing over 10k pairs of multi-source, multi-resolution, and multi-scene SAR and optical images.
arXiv Detail & Related papers (2025-02-03T02:51:30Z) - Deep Learning Based Speckle Filtering for Polarimetric SAR Images. Application to Sentinel-1 [51.404644401997736]
We propose a complete framework to remove speckle in polarimetric SAR images using a convolutional neural network.
Experiments show that the proposed approach offers exceptional results in both speckle reduction and resolution preservation.
arXiv Detail & Related papers (2024-08-28T10:07:17Z) - Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior [13.148815217684277]
Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit.
Existing methods confront challenges in recovering SR images with clear textures and correct ground objects.
We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution.
arXiv Detail & Related papers (2024-05-11T16:06:16Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Leveraging Spatial and Photometric Context for Calibrated Non-Lambertian
Photometric Stereo [61.6260594326246]
We introduce an efficient fully-convolutional architecture that can leverage both spatial and photometric context simultaneously.
Using separable 4D convolutions and 2D heat-maps reduces the size and makes more efficient.
arXiv Detail & Related papers (2021-03-22T18:06:58Z) - A Multiscale Graph Convolutional Network for Change Detection in
Homogeneous and Heterogeneous Remote Sensing Images [12.823633963080281]
Change detection (CD) in remote sensing images has been an ever-expanding area of research.
In this paper, a novel CD method based on the graph convolutional network (GCN) and multiscale object-based technique is proposed for both homogeneous and heterogeneous images.
arXiv Detail & Related papers (2021-02-16T09:26:31Z) - A Parallel Down-Up Fusion Network for Salient Object Detection in
Optical Remote Sensing Images [82.87122287748791]
We propose a novel Parallel Down-up Fusion network (PDF-Net) for salient object detection in optical remote sensing images (RSIs)
It takes full advantage of the in-path low- and high-level features and cross-path multi-resolution features to distinguish diversely scaled salient objects and suppress the cluttered backgrounds.
Experiments on the ORSSD dataset demonstrate that the proposed network is superior to the state-of-the-art approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-10-02T05:27:57Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.