Co-visual pattern augmented generative transformer learning for
automobile geo-localization
- URL: http://arxiv.org/abs/2203.09135v2
- Date: Thu, 20 Apr 2023 12:49:27 GMT
- Title: Co-visual pattern augmented generative transformer learning for
automobile geo-localization
- Authors: Jianwei Zhao and Qiang Zhai and Pengbo Zhao and Rui Huang and Hong
Cheng
- Abstract summary: Cross-view geo-localization (CVGL) aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial images.
We present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL) for CVGL.
- Score: 12.449657263683337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geolocation is a fundamental component of route planning and navigation for
unmanned vehicles, but GNSS-based geolocation fails under denial-of-service
conditions. Cross-view geo-localization (CVGL), which aims to estimate the
geographical location of the ground-level camera by matching against enormous
geo-tagged aerial (\emph{e.g.}, satellite) images, has received lots of
attention but remains extremely challenging due to the drastic appearance
differences across aerial-ground views. In existing methods, global
representations of different views are extracted primarily using Siamese-like
architectures, but their interactive benefits are seldom taken into account. In
this paper, we present a novel approach using cross-view knowledge generative
techniques in combination with transformers, namely mutual generative
transformer learning (MGTL), for CVGL. Specifically, by taking the initial
representations produced by the backbone network, MGTL develops two separate
generative sub-modules -- one for aerial-aware knowledge generation from
ground-view semantics and vice versa -- and fully exploits the entirely mutual
benefits through the attention mechanism. Moreover, to better capture the
co-visual relationships between aerial and ground views, we introduce a
cascaded attention masking algorithm to further boost accuracy. Extensive
experiments on challenging public benchmarks, \emph{i.e.}, {CVACT} and {CVUSA},
demonstrate the effectiveness of the proposed method which sets new records
compared with the existing state-of-the-art models.
Related papers
- Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering [31.716967688739036]
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges.
Existing methods rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval.
We propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation.
arXiv Detail & Related papers (2024-11-22T09:22:39Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization [28.941724648519102]
This paper investigates the effective utilization of unlabeled data for large-area cross-view geo-localization (CVGL)
Common approaches to CVGL rely on ground-satellite image pairs and employ label-driven supervised training.
We propose an unsupervised framework including a cross-view projection to guide the model for retrieving initial pseudo-labels.
arXiv Detail & Related papers (2024-03-21T07:48:35Z) - Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting [70.66710698485745]
We propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting.
AHSTN exploits the spatial hierarchy and modeling multi-scale spatial correlations.
Experiments on two real-world datasets show that AHSTN achieves better performance over several strong baselines.
arXiv Detail & Related papers (2023-06-15T14:50:27Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - Activation Regression for Continuous Domain Generalization with
Applications to Crop Classification [48.795866501365694]
Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions.
We model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem.
We develop a dataset spatially distributed across the entire continental United States.
arXiv Detail & Related papers (2022-04-14T15:41:39Z) - Think Global, Act Local: Dual-scale Graph Transformer for
Vision-and-Language Navigation [87.03299519917019]
We propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding.
We build a topological map on-the-fly to enable efficient exploration in global action space.
The proposed approach, DUET, significantly outperforms state-of-the-art methods on goal-oriented vision-and-language navigation benchmarks.
arXiv Detail & Related papers (2022-02-23T19:06:53Z) - Cross-view Geo-localization with Evolving Transformer [7.5800316275498645]
Cross-view geo-localization is challenging due to drastic appearance and geometry differences across views.
We devise a novel geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies.
Our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.
arXiv Detail & Related papers (2021-07-02T05:33:14Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.