SuperGF: Unifying Local and Global Features for Visual Localization
- URL: http://arxiv.org/abs/2212.13105v1
- Date: Fri, 23 Dec 2022 13:48:07 GMT
- Title: SuperGF: Unifying Local and Global Features for Visual Localization
- Authors: Wenzheng Song, Ran Yan, Boshu Lei, Takayuki Okatani
- Abstract summary: SuperGF is a transformer-based aggregation model that operates directly on image-matching-specific local features.
We provide implementations of SuperGF using various types of local features, including dense and sparse learning-based or hand-crafted descriptors.
- Score: 13.869227429939423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advanced visual localization techniques encompass image retrieval challenges
and 6 Degree-of-Freedom (DoF) camera pose estimation, such as hierarchical
localization. Thus, they must extract global and local features from input
images. Previous methods have achieved this through resource-intensive or
accuracy-reducing means, such as combinatorial pipelines or multi-task
distillation. In this study, we present a novel method called SuperGF, which
effectively unifies local and global features for visual localization, leading
to a higher trade-off between localization accuracy and computational
efficiency. Specifically, SuperGF is a transformer-based aggregation model that
operates directly on image-matching-specific local features and generates
global features for retrieval. We conduct experimental evaluations of our
method in terms of both accuracy and efficiency, demonstrating its advantages
over other methods. We also provide implementations of SuperGF using various
types of local features, including dense and sparse learning-based or
hand-crafted descriptors.
Related papers
- Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z) - RING++: Roto-translation Invariant Gram for Global Localization on a
Sparse Scan Map [20.276334172402763]
We propose RING++ which has roto-translation invariant representation for place recognition, and global convergence for both rotation and translation estimation.
With the theoretical guarantee, RING++ is able to address the large viewpoint difference using a lightweight map with sparse scans.
This is the first learning-free framework to address all subtasks of global localization in the sparse scan map.
arXiv Detail & Related papers (2022-10-12T07:49:24Z) - Centralized Feature Pyramid for Object Detection [53.501796194901964]
Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications.
In this paper, we propose a OLO Feature Pyramid for object detection, which is based on a globally explicit centralized feature regulation.
arXiv Detail & Related papers (2022-10-05T08:32:54Z) - SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant
Descriptor [6.326554177747699]
We develop SphereVLAD++, an attention-enhanced viewpoint invariant place recognition method.
We show that SphereVLAD++ outperforms all relative state-of-the-art 3D place recognition methods under small or even totally reversed viewpoint differences.
arXiv Detail & Related papers (2022-07-06T20:32:43Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Gait Recognition via Effective Global-Local Feature Representation and
Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields.
Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans.
We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z) - Features for Ground Texture Based Localization -- A Survey [12.160708336715489]
Ground texture based vehicle localization using feature-based methods is a promising approach to achieve infrastructure-free high-accuracy localization.
We provide the first extensive evaluation of available feature extraction methods for this task, using separately taken image pairs as well as synthetic transformations.
We identify AKAZE, SURF and CenSurE as best performing keypoint detectors, and find pairings of CenSurE with the ORB, BRIEF and LATCH feature descriptors to achieve greatest success rates for incremental localization.
arXiv Detail & Related papers (2020-02-27T07:25:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.