Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization
- URL: http://arxiv.org/abs/2510.20247v1
- Date: Thu, 23 Oct 2025 06:07:07 GMT
- Title: Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization
- Authors: Shuhan Hu, Yiru Li, Yuanyuan Li, Yingying Zhu,
- Abstract summary: Cross-view object geo-localization enables high-precision object localization through cross-view matching.<n>Existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information.<n>We propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes.<n>We present EDGeo, an end-to-end framework for robust cross-view object geo-localization.
- Score: 8.559240391514063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view object geo-localization enables high-precision object localization through cross-view matching, with critical applications in autonomous driving, urban management, and disaster response. However, existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information, resulting in sensitivity to annotation shifts and limited cross-view matching capability. To address these limitations, we propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes, thereby upgrading the model from "location-aware" to "object-aware." Furthermore, to tackle the challenge of large-span objects (e.g., elongated buildings) in satellite imagery, we design a context enhancement module. This module employs horizontal and vertical strip convolutional kernels to extract long-range contextual features, enhancing feature discrimination among strip-like objects. Integrating MPE and CEM, we present EDGeo, an end-to-end framework for robust cross-view object geo-localization. Extensive experiments on two public datasets (CVOGL and VIGOR-Building) demonstrate that our method achieves state-of-the-art performance, with a 3.39% improvement in localization accuracy under challenging ground-to-satellite scenarios. This work provides a robust positional encoding paradigm and a contextual modeling framework for advancing cross-view geo-localization research.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - Improving Cross-view Object Geo-localization: A Dual Attention Approach with Cross-view Interaction and Multi-Scale Spatial Features [0.0]
Cross-view object geo-localization has recently gained attention due to potential applications.<n>We introduce a Cross-view and Cross-attention Module (CVCAM), which performs multiple iterations of interaction between the two views.<n>We also create a new dataset called G2D for the "Ground-to-Drone" localization task.
arXiv Detail & Related papers (2025-10-31T03:28:59Z) - Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association [3.5982006325887554]
We propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo.<n> AFGeo directly predicts the four directional offsets to the ground-truth box for each pixel localizing the object without any predefined anchors.<n>Our model is both lightweight and efficient, achieving state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2025-09-30T00:30:45Z) - Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention [17.777115738099916]
Cross-view geo-localization determines the location of a query image, captured by a drone or ground-based camera, by matching it to a geo-referenced satellite image.<n>We propose an Object-level Cross-view Geo-localization Network (OCGNet) to address these challenges.<n>OCGNet achieves state-of-the-art performance on a public dataset, CVOGL.
arXiv Detail & Related papers (2025-05-23T13:55:56Z) - EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z) - SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation [0.5461938536945723]
We propose a novel unsupervised domain adaptation technique for remote sensing semantic segmentation.<n>Our proposed SegDesicNet module regresses the GRID positional encoding of the geo coordinates projected over the unit sphere to obtain the domain loss.<n>Our algorithm seeks to reduce the modeling disparity between artificial neural networks and human comprehension of the physical world.
arXiv Detail & Related papers (2025-03-11T11:01:18Z) - Imagining the Unseen: Generative Location Modeling for Object Placement [49.71690795831461]
We develop a generative location model that learns to predict plausible bounding boxes for an object.<n>Our approach first tokenizes the image and target object class, then decodes bounding box coordinates through an autoregressive transformer.<n> Empirical evaluations reveal that our generative location model achieves superior placement accuracy on the OPA dataset.
arXiv Detail & Related papers (2024-10-17T14:00:41Z) - Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation [50.433911327489554]
The goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression.<n>To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM)<n>To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets.
arXiv Detail & Related papers (2024-10-11T08:28:04Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.