Related papers: NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

URL: http://arxiv.org/abs/2401.03771v2
Date: Mon, 16 Sep 2024 00:53:50 GMT
Title: NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation
Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald,
Abstract summary: We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets. We apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset, KITTI.
Score: 44.22677259411607
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrate the benefits of our approach to model performance and robustness. Our data augmentation pipeline, which we call \textit{NeRFmentation}, trains NeRFs on each scene in a dataset, filters out subpar NeRFs based on relevant metrics, and uses them to generate synthetic RGB-D images captured from new viewing directions. In this work, we apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset, KITTI, augmenting its training set of the Eigen split. We evaluate the resulting performance gain on the original test set, a separate popular driving dataset, and our own synthetic test set.

Related papers

Visual Autoregressive Modelling for Monocular Depth Estimation [69.01449528371916]
We propose a monocular depth estimation method based on visual autoregressive ( VAR) priors.<n>Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism.<n>We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets.
arXiv Detail & Related papers (2025-12-27T17:08:03Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting [12.323236593352698]
This work aims to introduce a novel pipeline for generating realistic synthetic data.<n>We generate 3D assets of multiple agricultural vehicles instead of using generic models.<n>We evaluate the impact of synthetic data on segmentation models such as PointNet++, Point Transformer V3, and OACNN, by training and validating the models only on synthetic data.
arXiv Detail & Related papers (2025-06-05T13:19:27Z)
Taming Diffusion for Dataset Distillation with High Representativeness [49.3818035378669]
D3HR is a novel diffusion-based framework to generate distilled datasets with high representativeness.<n>Our experiments demonstrate that D3HR can achieve higher accuracy across different model architectures.
arXiv Detail & Related papers (2025-05-23T22:05:59Z)
Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion [20.352548473293993]
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns. Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed. We introduce a new method, inspired by the physical motion of soft particles subjected to Brownian forces, allowing us to sample identities in a latent space under various constraints. With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-
arXiv Detail & Related papers (2024-04-30T22:32:02Z)
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance. Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z)
Private Synthetic Data Meets Ensemble Learning [15.425653946755025]
When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop. We introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data.
arXiv Detail & Related papers (2023-10-15T04:24:42Z)
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models [1.1965844936801797]
Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots. We present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds. Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks.
arXiv Detail & Related papers (2023-09-17T12:26:57Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
Quantifying Overfitting: Introducing the Overfitting Index [0.0]
Overfitting is where a model exhibits superior performance on training data but falters on unseen data. This paper introduces the Overfitting Index (OI), a novel metric devised to quantitatively assess a model's tendency to overfit. Our results underscore the variable overfitting behaviors across architectures and highlight the mitigative impact of data augmentation.
arXiv Detail & Related papers (2023-08-16T21:32:57Z)
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training. Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model [36.993630058695345]
We propose a shared and specific feature learning (S2FL) model to decomposing multimodal RS data into modality-shared and modality-specific components. To better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification.
arXiv Detail & Related papers (2021-05-21T08:14:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.