NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2401.03771v1
- Date: Mon, 8 Jan 2024 09:50:54 GMT
- Title: NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation
- Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert
Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald
- Abstract summary: We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets.
We apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset KITTI.
- Score: 45.88995941857111
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The capabilities of monocular depth estimation (MDE) models are limited by
the availability of sufficient and diverse datasets. In the case of MDE models
for autonomous driving, this issue is exacerbated by the linearity of the
captured data trajectories. We propose a NeRF-based data augmentation pipeline
to introduce synthetic data with more diverse viewing directions into training
datasets and demonstrate the benefits of our approach to model performance and
robustness. Our data augmentation pipeline, which we call "NeRFmentation",
trains NeRFs on each scene in the dataset, filters out subpar NeRFs based on
relevant metrics, and uses them to generate synthetic RGB-D images captured
from new viewing directions. In this work, we apply our technique in
conjunction with three state-of-the-art MDE architectures on the popular
autonomous driving dataset KITTI, augmenting its training set of the Eigen
split. We evaluate the resulting performance gain on the original test set, a
separate popular driving set, and our own synthetic test set.
Related papers
- Explicit-NeRF-QA: A Quality Assessment Database for Explicit NeRF Model Compression [10.469092315640696]
We construct a new dataset, called Explicit-NeRF-QA, to address the challenge of the NeRF compression study.
We use 22 3D objects with diverse geometries, textures, and material complexities to train four typical explicit NeRF models.
A subjective experiment with lab environment is conducted to collect subjective scores from 21 viewers.
arXiv Detail & Related papers (2024-07-11T04:02:05Z) - Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion [20.352548473293993]
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns.
Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed.
We introduce a new method, inspired by the physical motion of soft particles subjected to Brownian forces, allowing us to sample identities in a latent space under various constraints.
With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-
arXiv Detail & Related papers (2024-04-30T22:32:02Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Private Synthetic Data Meets Ensemble Learning [15.425653946755025]
When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop.
We introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data.
arXiv Detail & Related papers (2023-10-15T04:24:42Z) - LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models [1.1965844936801797]
Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots.
We present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds.
Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks.
arXiv Detail & Related papers (2023-09-17T12:26:57Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Quantifying Overfitting: Introducing the Overfitting Index [0.0]
Overfitting is where a model exhibits superior performance on training data but falters on unseen data.
This paper introduces the Overfitting Index (OI), a novel metric devised to quantitatively assess a model's tendency to overfit.
Our results underscore the variable overfitting behaviors across architectures and highlight the mitigative impact of data augmentation.
arXiv Detail & Related papers (2023-08-16T21:32:57Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.