CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic
Data
- URL: http://arxiv.org/abs/2112.09081v1
- Date: Thu, 16 Dec 2021 18:05:48 GMT
- Title: CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic
Data
- Authors: Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li, Iordan Doytchinov
- Abstract summary: We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data.
To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool.
We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
- Score: 2.554905387213586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a visual localization system that learns to estimate camera poses
in the real world with the help of synthetic data. Despite significant progress
in recent years, most learning-based approaches to visual localization target
at a single domain and require a dense database of geo-tagged images to
function well. To mitigate the data scarcity issue and improve the scalability
of the neural localization models, we introduce TOPO-DataGen, a versatile
synthetic data generation tool that traverses smoothly between the real and
virtual world, hinged on the geographic camera viewpoint. New large-scale
sim-to-real benchmark datasets are proposed to showcase and evaluate the
utility of the said synthetic data. Our experiments reveal that synthetic data
generically enhances the neural network performance on real data. Furthermore,
we introduce CrossLoc, a cross-modal visual representation learning approach to
pose estimation that makes full use of the scene coordinate ground truth via
self-supervision. Without any extra data, CrossLoc significantly outperforms
the state-of-the-art methods and achieves substantially higher real-data sample
efficiency. Our code is available at https://github.com/TOPO-EPFL/CrossLoc.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for
Procedural Synthetic Data [71.22495169640239]
Procedural signed distance functions (SDFs) are a powerful tool for modeling large-scale detailed scenes.
We propose OcMesher, a mesh extraction algorithm that efficiently handles high-detail unbounded scenes with perfect view-consistency.
arXiv Detail & Related papers (2023-12-13T18:56:13Z) - Synfeal: A Data-Driven Simulator for End-to-End Camera Localization [0.9749560288448114]
We propose a framework that synthesizes large localization datasets based on realistic 3D reconstructions of the real world.
Our framework, Synfeal, is an open-source, data-driven simulator that synthesizes RGB images by moving a virtual camera through a realistic 3D textured mesh.
The results validate that the training of camera localization algorithms on datasets generated by Synfeal leads to better results when compared to datasets generated by state-of-the-art methods.
arXiv Detail & Related papers (2023-05-29T17:29:02Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Semi-synthesis: A fast way to produce effective datasets for stereo
matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance.
We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture.
With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z) - DASGIL: Domain Adaptation for Semantic and Geometric-aware Image-based
Localization [27.294822556484345]
Long-term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics.
We propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition.
arXiv Detail & Related papers (2020-10-01T17:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.