Translating Images into Maps
- URL: http://arxiv.org/abs/2110.00966v1
- Date: Sun, 3 Oct 2021 09:52:46 GMT
- Title: Translating Images into Maps
- Authors: Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden
- Abstract summary: We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world.
We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map.
Posing the problem as translation allows the network to use the context of the image when interpreting the role of each pixel.
- Score: 43.81207458783278
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We approach instantaneous mapping, converting images to a top-down view of
the world, as a translation problem. We show how a novel form of transformer
network can be used to map from images and video directly to an overhead map or
bird's-eye-view (BEV) of the world, in a single end-to-end network. We assume a
1-1 correspondence between a vertical scanline in the image, and rays passing
through the camera location in an overhead map. This lets us formulate map
generation from an image as a set of sequence-to-sequence translations. Posing
the problem as translation allows the network to use the context of the image
when interpreting the role of each pixel. This constrained formulation, based
upon a strong physical grounding of the problem, leads to a restricted
transformer network that is convolutional in the horizontal direction only. The
structure allows us to make efficient use of data when training, and obtains
state-of-the-art results for instantaneous mapping of three large-scale
datasets, including a 15% and 30% relative gain against existing best
performing methods on the nuScenes and Argoverse datasets, respectively. We
make our code available on
https://github.com/avishkarsaha/translating-images-into-maps.
Related papers
- Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations [8.522160106746478]
We present a global visual localization system capable of localizing a single camera image across various 3D map representations.
Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs.
NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
arXiv Detail & Related papers (2024-08-21T19:37:17Z) - Extremal Domain Translation with Neural Optimal Transport [76.38747967445994]
We propose the extremal transport (ET) which is a formalization of the theoretically best possible unpaired translation between a pair of domains.
Inspired by the recent advances in neural optimal transport (OT), we propose a scalable algorithm to approximate ET maps as a limit of partial OT maps.
We test our algorithm on toy examples and on the unpaired image-to-image translation task.
arXiv Detail & Related papers (2023-01-30T13:28:23Z) - BEV-Locator: An End-to-end Visual Semantic Localization Network Using
Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images.
The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios.
Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z) - TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization [81.70547404891099]
CNN-based methods for cross-view image geo-localization fail to model global correlation.
We propose a pure transformer-based approach (TransGeo) to address these limitations.
TransGeo achieves state-of-the-art results on both urban and rural datasets.
arXiv Detail & Related papers (2022-03-31T21:19:41Z) - COTR: Correspondence Transformer for Matching Across Images [31.995943755283786]
We propose a novel framework for finding correspondences in images based on a deep neural network.
By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
arXiv Detail & Related papers (2021-03-25T22:47:02Z) - Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation [59.73535607392732]
Image to image translation aims to learn a mapping that transforms an image from one visual domain to another.
We propose the use of an image retrieval system to assist the image-to-image translation task.
arXiv Detail & Related papers (2020-08-11T20:11:53Z) - Contrastive Learning for Unpaired Image-to-Image Translation [64.47477071705866]
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
arXiv Detail & Related papers (2020-07-30T17:59:58Z) - Structural-analogy from a Single Image Pair [118.61885732829117]
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
arXiv Detail & Related papers (2020-04-05T14:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.