NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
- URL: http://arxiv.org/abs/2406.17345v2
- Date: Wed, 29 Oct 2025 19:47:47 GMT
- Title: NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
- Authors: Jonas Kulhanek, Torsten Sattler,
- Abstract summary: Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations.<n>It is becoming difficult to keep track of the current state of the art due to methods using different evaluation protocols, benchmarkings being difficult to install and use, and methods not generalizing well to novel 3D scenes.<n>In our experiments, we show that even tiny differences in the evaluation protocols of various methods can artificially boost the performance of these methods.
- Score: 27.110978556614246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. In our experiments, we show that even tiny differences in the evaluation protocols of various methods can artificially boost the performance of these methods. This raises questions about the validity of quantitative comparisons performed in the literature. To address these questions, we propose NerfBaselines, an evaluation framework which provides consistent benchmarking tools, ensures reproducibility, and simplifies the installation and use of various methods. We validate our implementation experimentally by reproducing the numbers reported in the original papers. For improved accessibility, we release a web platform that compares commonly used methods on standard benchmarks. We strongly believe NerfBaselines is a valuable contribution to the community as it ensures that quantitative results are comparable and thus truly measure progress in the field of novel view synthesis.
Related papers
- InfoSynth: Information-Guided Benchmark Synthesis for LLMs [69.80981631587501]
Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation.<n>Traditional benchmark creation relies on manual human effort, a process that is both expensive and time-consuming.<n>This work introduces Info Synth, a novel framework for automatically generating and evaluating reasoning benchmarks.
arXiv Detail & Related papers (2026-01-02T05:26:27Z) - A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility [29.437125712259046]
Reasoning has emerged as the next major frontier for language models (LMs)
We conduct a comprehensive empirical study and find that current mathematical reasoning benchmarks are highly sensitive to subtle implementation choices.
We propose a standardized evaluation framework with clearly defined best practices and reporting standards.
arXiv Detail & Related papers (2025-04-09T17:58:17Z) - OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection [86.30994231610651]
Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos.
We propose textbfOpenTAD, a unified TAD framework consolidating 16 different TAD methods and 9 standard datasets into a modular framework.
Minimal effort is required to replace one module with a different design, train a feature-based TAD model in end-to-end mode, or switch between the two.
arXiv Detail & Related papers (2025-02-27T18:32:27Z) - Multimodal Instruction Disassembly with Covariate Shift Adaptation and Real-time Implementation [3.70729078195191]
We introduce a new miniature platform, RASCv3, that can simultaneously collect power and EM measurements from a target device.
We devise a new approach to combine and select features from power and EM traces using information theory.
The recognition rates of offline and real-time instruction disassemblers are compared for single- and multi-modal cases.
arXiv Detail & Related papers (2024-12-10T17:00:23Z) - AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark [43.76981659253837]
Differentiable rendering relies on a dense viewpoint coverage of the scene.
Many challenges arise when only a few input views are available.
A recurring problem in sparse rendering literature is the lack of an homogeneous, up-to-date, dataset and evaluation protocol.
We introduce a new dataset that follows the setup of the DTU MVS dataset.
arXiv Detail & Related papers (2024-09-23T14:10:06Z) - TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers [14.708092244093665]
We develop a strategy that utilizes a predicted depth confidence map to guide accurate local feature matching.
We present a novel G-3DGS method named TranSplat, which obtains the best performance on both the RealEstate10K and ACID benchmarks.
arXiv Detail & Related papers (2024-08-25T08:37:57Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods [4.6836510920448715]
This study explores the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction.
We assess performance based on tracking accuracy, mapping fidelity, and view synthesis.
Findings reveal that NeRF excels in view synthesis, offering unique capabilities in generating new perspectives from existing data.
arXiv Detail & Related papers (2024-08-08T07:11:57Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Benchopt: Reproducible, efficient and collaborative optimization
benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning.
Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z) - SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking [17.351635242415703]
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years.
Despite their progress and usefulness, an in-depth analysis of their strengths and weaknesses is not yet available.
We summarize current 3D MOT methods into a unified framework by decomposing them into four constituent parts.
arXiv Detail & Related papers (2021-11-18T10:57:57Z) - FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural
Language Understanding [89.92513889132825]
We introduce an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev-test correlation, and stability.
We open-source our toolkit, FewNLU, that implements our evaluation framework along with a number of state-of-the-art methods.
arXiv Detail & Related papers (2021-09-27T00:57:30Z) - Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep
Learning [66.59455427102152]
We introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
Each baseline is a self-contained experiment pipeline with easily reusable and extendable components.
We provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results.
arXiv Detail & Related papers (2021-06-07T23:57:32Z) - Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling.
We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z) - Mimicry: Towards the Reproducibility of GAN Research [0.0]
We introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics.
We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures.
arXiv Detail & Related papers (2020-05-05T21:07:26Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.