Related papers: CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

URL: http://arxiv.org/abs/2112.09081v1
Date: Thu, 16 Dec 2021 18:05:48 GMT
Title: CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data
Authors: Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li, Iordan Doytchinov
Abstract summary: We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool. We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
Score: 2.554905387213586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. Despite significant progress in recent years, most learning-based approaches to visual localization target at a single domain and require a dense database of geo-tagged images to function well. To mitigate the data scarcity issue and improve the scalability of the neural localization models, we introduce TOPO-DataGen, a versatile synthetic data generation tool that traverses smoothly between the real and virtual world, hinged on the geographic camera viewpoint. New large-scale sim-to-real benchmark datasets are proposed to showcase and evaluate the utility of the said synthetic data. Our experiments reveal that synthetic data generically enhances the neural network performance on real data. Furthermore, we introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation that makes full use of the scene coordinate ground truth via self-supervision. Without any extra data, CrossLoc significantly outperforms the state-of-the-art methods and achieves substantially higher real-data sample efficiency. Our code is available at https://github.com/TOPO-EPFL/CrossLoc.

Related papers

Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data. Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z)
Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models. Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z)
View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for Procedural Synthetic Data [71.22495169640239]
Procedural signed distance functions (SDFs) are a powerful tool for modeling large-scale detailed scenes. We propose OcMesher, a mesh extraction algorithm that efficiently handles high-detail unbounded scenes with perfect view-consistency.
arXiv Detail & Related papers (2023-12-13T18:56:13Z)
Synfeal: A Data-Driven Simulator for End-to-End Camera Localization [0.9749560288448114]
We propose a framework that synthesizes large localization datasets based on realistic 3D reconstructions of the real world. Our framework, Synfeal, is an open-source, data-driven simulator that synthesizes RGB images by moving a virtual camera through a realistic 3D textured mesh. The results validate that the training of camera localization algorithms on datasets generated by Synfeal leads to better results when compared to datasets generated by state-of-the-art methods.
arXiv Detail & Related papers (2023-05-29T17:29:02Z)
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist. To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Semi-synthesis: A fast way to produce effective datasets for stereo matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance. We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture. With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z)
DASGIL: Domain Adaptation for Semantic and Geometric-aware Image-based Localization [27.294822556484345]
Long-term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics. We propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition.
arXiv Detail & Related papers (2020-10-01T17:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.