Related papers: Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization

Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization

URL: http://arxiv.org/abs/2510.20247v1
Date: Thu, 23 Oct 2025 06:07:07 GMT
Title: Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization
Authors: Shuhan Hu, Yiru Li, Yuanyuan Li, Yingying Zhu,
Abstract summary: Cross-view object geo-localization enables high-precision object localization through cross-view matching.<n>Existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information.<n>We propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes.<n>We present EDGeo, an end-to-end framework for robust cross-view object geo-localization.
Score: 8.559240391514063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view object geo-localization enables high-precision object localization through cross-view matching, with critical applications in autonomous driving, urban management, and disaster response. However, existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information, resulting in sensitivity to annotation shifts and limited cross-view matching capability. To address these limitations, we propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes, thereby upgrading the model from "location-aware" to "object-aware." Furthermore, to tackle the challenge of large-span objects (e.g., elongated buildings) in satellite imagery, we design a context enhancement module. This module employs horizontal and vertical strip convolutional kernels to extract long-range contextual features, enhancing feature discrimination among strip-like objects. Integrating MPE and CEM, we present EDGeo, an end-to-end framework for robust cross-view object geo-localization. Extensive experiments on two public datasets (CVOGL and VIGOR-Building) demonstrate that our method achieves state-of-the-art performance, with a 3.39% improvement in localization accuracy under challenging ground-to-satellite scenarios. This work provides a robust positional encoding paradigm and a contextual modeling framework for advancing cross-view geo-localization research.

Related papers

IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z)
Improving Cross-view Object Geo-localization: A Dual Attention Approach with Cross-view Interaction and Multi-Scale Spatial Features [0.0]
Cross-view object geo-localization has recently gained attention due to potential applications.<n>We introduce a Cross-view and Cross-attention Module (CVCAM), which performs multiple iterations of interaction between the two views.<n>We also create a new dataset called G2D for the "Ground-to-Drone" localization task.
arXiv Detail & Related papers (2025-10-31T03:28:59Z)
Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association [3.5982006325887554]
We propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo.<n> AFGeo directly predicts the four directional offsets to the ground-truth box for each pixel localizing the object without any predefined anchors.<n>Our model is both lightweight and efficient, achieving state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2025-09-30T00:30:45Z)
Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention [17.777115738099916]
Cross-view geo-localization determines the location of a query image, captured by a drone or ground-based camera, by matching it to a geo-referenced satellite image.<n>We propose an Object-level Cross-view Geo-localization Network (OCGNet) to address these challenges.<n>OCGNet achieves state-of-the-art performance on a public dataset, CVOGL.
arXiv Detail & Related papers (2025-05-23T13:55:56Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation [0.5461938536945723]
We propose a novel unsupervised domain adaptation technique for remote sensing semantic segmentation.<n>Our proposed SegDesicNet module regresses the GRID positional encoding of the geo coordinates projected over the unit sphere to obtain the domain loss.<n>Our algorithm seeks to reduce the modeling disparity between artificial neural networks and human comprehension of the physical world.
arXiv Detail & Related papers (2025-03-11T11:01:18Z)
Imagining the Unseen: Generative Location Modeling for Object Placement [49.71690795831461]
We develop a generative location model that learns to predict plausible bounding boxes for an object.<n>Our approach first tokenizes the image and target object class, then decodes bounding box coordinates through an autoregressive transformer.<n> Empirical evaluations reveal that our generative location model achieves superior placement accuracy on the OPA dataset.
arXiv Detail & Related papers (2024-10-17T14:00:41Z)
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation [50.433911327489554]
The goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression.<n>To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM)<n>To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets.
arXiv Detail & Related papers (2024-10-11T08:28:04Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion. We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z)
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. Recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.