TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization
- URL: http://arxiv.org/abs/2204.00097v1
- Date: Thu, 31 Mar 2022 21:19:41 GMT
- Title: TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization
- Authors: Sijie Zhu, Mubarak Shah, Chen Chen
- Abstract summary: CNN-based methods for cross-view image geo-localization fail to model global correlation.
We propose a pure transformer-based approach (TransGeo) to address these limitations.
TransGeo achieves state-of-the-art results on both urban and rural datasets.
- Score: 81.70547404891099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant CNN-based methods for cross-view image geo-localization rely on
polar transform and fail to model global correlation. We propose a pure
transformer-based approach (TransGeo) to address these limitations from a
different perspective. TransGeo takes full advantage of the strengths of
transformer related to global information modeling and explicit position
information encoding. We further leverage the flexibility of transformer input
and propose an attention-guided non-uniform cropping method, so that
uninformative image patches are removed with negligible drop on performance to
reduce computation cost. The saved computation can be reallocated to increase
resolution only for informative patches, resulting in performance improvement
with no additional computation cost. This "attend and zoom-in" strategy is
highly similar to human behavior when observing images. Remarkably, TransGeo
achieves state-of-the-art results on both urban and rural datasets, with
significantly less computation cost than CNN-based methods. It does not rely on
polar transform and infers faster than CNN-based methods. Code is available at
https://github.com/Jeff-Zilence/TransGeo2022.
Related papers
- GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers [53.80009458891537]
Cross-view video geo-localization aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images.
Current CVGL methods use camera and odometry data, typically absent in real-world scenarios.
We propose GAReT, a fully transformer-based method for CVGL that does not require camera and odometry data.
arXiv Detail & Related papers (2024-08-05T21:29:33Z) - ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation.
Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Where in the World is this Image? Transformer-based Geo-localization in
the Wild [48.69031054573838]
Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem.
We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image.
We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
arXiv Detail & Related papers (2022-04-29T03:27:23Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - Cross-view Geo-localization with Evolving Transformer [7.5800316275498645]
Cross-view geo-localization is challenging due to drastic appearance and geometry differences across views.
We devise a novel geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies.
Our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.
arXiv Detail & Related papers (2021-07-02T05:33:14Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.