Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association
- URL: http://arxiv.org/abs/2509.25623v1
- Date: Tue, 30 Sep 2025 00:30:45 GMT
- Title: Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association
- Authors: Xingtao Ling, Chenlin Fu, Yingying Zhu,
- Abstract summary: We propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo.<n> AFGeo directly predicts the four directional offsets to the ground-truth box for each pixel localizing the object without any predefined anchors.<n>Our model is both lightweight and efficient, achieving state-of-the-art performance on benchmark datasets.
- Score: 3.5982006325887554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing cross-view object geo-localization approaches adopt anchor-based paradigm. Although effective, such methods are inherently constrained by predefined anchors. To eliminate this dependency, we first propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo. AFGeo directly predicts the four directional offsets (left, right, top, bottom) to the ground-truth box for each pixel, thereby localizing the object without any predefined anchors. To obtain a more robust spatial prior, AFGeo incorporates Gaussian Position Encoding (GPE) to model the click point in the query image, mitigating the uncertainty of object position that challenges object localization in cross-view scenarios. In addition, AFGeo incorporates a Cross-view Object Association Module (CVOAM) that relates the same object and its surrounding context across viewpoints, enabling reliable localization under large cross-view appearance gaps. By adopting an anchor-free localization paradigm that integrates GPE and CVOAM with minimal parameter overhead, our model is both lightweight and computationally efficient, achieving state-of-the-art performance on benchmark datasets.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts [4.521626189942935]
Cross-view object Geo-localization aims to precisely pinpoint the same object across large-scale satellite imagery based on drone images.<n>We present SMGeo, a promptable end-to-end transformer-based model for object Geo-localization.
arXiv Detail & Related papers (2025-11-18T03:21:20Z) - Improving Cross-view Object Geo-localization: A Dual Attention Approach with Cross-view Interaction and Multi-Scale Spatial Features [0.0]
Cross-view object geo-localization has recently gained attention due to potential applications.<n>We introduce a Cross-view and Cross-attention Module (CVCAM), which performs multiple iterations of interaction between the two views.<n>We also create a new dataset called G2D for the "Ground-to-Drone" localization task.
arXiv Detail & Related papers (2025-10-31T03:28:59Z) - Seeing the Unseen: Mask-Driven Positional Encoding and Strip-Convolution Context Modeling for Cross-View Object Geo-Localization [8.559240391514063]
Cross-view object geo-localization enables high-precision object localization through cross-view matching.<n>Existing methods rely on keypoint-based positional encoding, which captures only 2D coordinates while neglecting object shape information.<n>We propose a mask-based positional encoding scheme that leverages segmentation masks to capture both spatial coordinates and object silhouettes.<n>We present EDGeo, an end-to-end framework for robust cross-view object geo-localization.
arXiv Detail & Related papers (2025-10-23T06:07:07Z) - Recurrent Cross-View Object Geo-Localization [23.685973292321574]
Cross-view object geo-localization (CVOGL) aims to determine the location of a specific object in high-resolution satellite imagery given a query image with a point prompt.<n>We propose ReCOT, a Recurrent Cross-view Object geo-localization Transformer, which reformulates CVOGL as a recurrent localization task.<n>ReCOT introduces a set of learnable tokens that encode task-specific intent from the query image and prompt embeddings, and iteratively attend to the reference features to refine the predicted location.
arXiv Detail & Related papers (2025-09-16T07:18:23Z) - Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention [17.777115738099916]
Cross-view geo-localization determines the location of a query image, captured by a drone or ground-based camera, by matching it to a geo-referenced satellite image.<n>We propose an Object-level Cross-view Geo-localization Network (OCGNet) to address these challenges.<n>OCGNet achieves state-of-the-art performance on a public dataset, CVOGL.
arXiv Detail & Related papers (2025-05-23T13:55:56Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Simple, Effective and General: A New Backbone for Cross-view Image
Geo-localization [9.687328460113832]
We propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG)
The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers.
Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works.
arXiv Detail & Related papers (2023-02-03T06:50:51Z) - LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation [69.70498875887611]
LocPoseNet is able to robustly learn location prior for unseen objects.
Our method outperforms existing works by a large margin on LINEMOD and GenMOP.
arXiv Detail & Related papers (2022-11-29T15:21:34Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.