Synfeal: A Data-Driven Simulator for End-to-End Camera Localization
- URL: http://arxiv.org/abs/2305.18260v1
- Date: Mon, 29 May 2023 17:29:02 GMT
- Title: Synfeal: A Data-Driven Simulator for End-to-End Camera Localization
- Authors: Daniel Coelho, Miguel Oliveira, and Paulo Dias
- Abstract summary: We propose a framework that synthesizes large localization datasets based on realistic 3D reconstructions of the real world.
Our framework, Synfeal, is an open-source, data-driven simulator that synthesizes RGB images by moving a virtual camera through a realistic 3D textured mesh.
The results validate that the training of camera localization algorithms on datasets generated by Synfeal leads to better results when compared to datasets generated by state-of-the-art methods.
- Score: 0.9749560288448114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting real-world data is often considered the bottleneck of Artificial
Intelligence, stalling the research progress in several fields, one of which is
camera localization. End-to-end camera localization methods are still
outperformed by traditional methods, and we argue that the inconsistencies
associated with the data collection techniques are restraining the potential of
end-to-end methods. Inspired by the recent data-centric paradigm, we propose a
framework that synthesizes large localization datasets based on realistic 3D
reconstructions of the real world. Our framework, termed Synfeal: Synthetic
from Real, is an open-source, data-driven simulator that synthesizes RGB images
by moving a virtual camera through a realistic 3D textured mesh, while
collecting the corresponding ground-truth camera poses. The results validate
that the training of camera localization algorithms on datasets generated by
Synfeal leads to better results when compared to datasets generated by
state-of-the-art methods. Using Synfeal, we conducted the first analysis of the
relationship between the size of the dataset and the performance of camera
localization algorithms. Results show that the performance significantly
increases with the dataset size. Our results also suggest that when a large
localization dataset with high quality is available, training from scratch
leads to better performances. Synfeal is publicly available at
https://github.com/DanielCoelho112/synfeal.
Related papers
- Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.
Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.
Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - Unleashing the Power of Data Synthesis in Visual Localization [17.159091187694884]
Methods that regress camera poses from query images have gained attention for fast inference.
We aim to unleash the power of data synthesis to promote the generalizability of pose regression.
We build a two-branch joint training pipeline, with an adversarial discriminator to bridge the syn-to-real gap.
arXiv Detail & Related papers (2024-11-28T16:58:10Z) - GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring [50.72230109855628]
We propose GS-Blur, a dataset of synthesized realistic blurry images created using a novel approach.
We first reconstruct 3D scenes from multi-view images using 3D Gaussian Splatting (3DGS), then render blurry images by moving the camera view along the randomly generated motion trajectories.
By adopting various camera trajectories in reconstructing our GS-Blur, our dataset contains realistic and diverse types of blur, offering a large-scale dataset that generalizes well to real-world blur.
arXiv Detail & Related papers (2024-10-31T06:17:16Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models.
Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - FSID: Fully Synthetic Image Denoising via Procedural Scene Generation [12.277286575812441]
We develop a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks.
Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations.
We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results.
arXiv Detail & Related papers (2022-12-07T21:21:55Z) - CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic
Data [2.554905387213586]
We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data.
To mitigate the data scarcity issue, we introduce TOPO-DataGen, a versatile synthetic data generation tool.
We also introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation.
arXiv Detail & Related papers (2021-12-16T18:05:48Z) - Semi-synthesis: A fast way to produce effective datasets for stereo
matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance.
We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture.
With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z) - CycleISP: Real Image Restoration via Improved Data Synthesis [166.17296369600774]
We present a framework that models camera imaging pipeline in forward and reverse directions.
By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets.
arXiv Detail & Related papers (2020-03-17T15:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.