Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View
Synthesis?
- URL: http://arxiv.org/abs/2403.06092v1
- Date: Sun, 10 Mar 2024 04:27:06 GMT
- Title: Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View
Synthesis?
- Authors: Hanxin Zhu, Tianyu He, Xin Li, Bingchen Li, Zhibo Chen
- Abstract summary: NeRF has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure.
When fewer known views are given, the model is prone to overfitting the given views.
- Score: 19.34823662319042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Radiance Field (NeRF) has achieved superior performance for novel view
synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a
volume rendering procedure, however, when fewer known views are given (i.e.,
few-shot view synthesis), the model is prone to overfit the given views. To
handle this issue, previous efforts have been made towards leveraging learned
priors or introducing additional regularizations. In contrast, in this paper,
we for the first time provide an orthogonal method from the perspective of
network structure. Given the observation that trivially reducing the number of
model parameters alleviates the overfitting issue, but at the cost of missing
details, we propose the multi-input MLP (mi-MLP) that incorporates the inputs
(i.e., location and viewing direction) of the vanilla MLP into each layer to
prevent the overfitting issue without harming detailed synthesis. To further
reduce the artifacts, we propose to model colors and volume density separately
and present two regularization terms. Extensive experiments on multiple
datasets demonstrate that: 1) although the proposed mi-MLP is easy to
implement, it is surprisingly effective as it boosts the PSNR of the baseline
from $14.73$ to $24.23$. 2) the overall framework achieves state-of-the-art
results on a wide range of benchmarks. We will release the code upon
publication.
Related papers
- FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain [16.693117400535833]
Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting.
While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies.
We introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values.
arXiv Detail & Related papers (2024-12-02T16:04:15Z) - Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused [44.37155553647802]
Large Language Models (LLMs) have demonstrated exceptional performance across various natural language processing tasks.
They occasionally yield content that factually inaccurate or discordant with the expected output.
Recent works have investigated contrastive decoding between the original model and an amateur model with induced hallucination.
We introduce a novel contrastive decoding framework termed LOL (LOwer Layer Matters)
arXiv Detail & Related papers (2024-08-16T14:23:59Z) - MLP Can Be A Good Transformer Learner [73.01739251050076]
Self-attention mechanism is the key of the Transformer but often criticized for its computation demands.
This paper introduces a novel strategy that simplifies vision transformers and reduces computational load through the selective removal of non-essential attention layers.
arXiv Detail & Related papers (2024-04-08T16:40:15Z) - Aligning Modalities in Vision Large Language Models via Preference
Fine-tuning [67.62925151837675]
In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning.
Specifically, we propose POVID to generate feedback data with AI models.
We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data.
In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
arXiv Detail & Related papers (2024-02-18T00:56:16Z) - VaLID: Variable-Length Input Diffusion for Novel View Synthesis [36.57742242154048]
Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision.
We try to process each pose image pair separately and then fuse them as a unified visual representation which will be injected into the model.
The Multi-view Cross Former module is proposed which maps variable-length input data to fix-size output data.
arXiv Detail & Related papers (2023-12-14T12:52:53Z) - Self-improving Multiplane-to-layer Images for Novel View Synthesis [3.9901365062418312]
We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene.
We start by representing the scene with a set of fronto-parallel semitransparent planes and afterward convert them to deformable layers in an end-to-end manner.
Our method does not require fine-tuning when a new scene is processed and can handle an arbitrary number of views without restrictions.
arXiv Detail & Related papers (2022-10-04T13:27:14Z) - Generalizable Patch-Based Neural Rendering [46.41746536545268]
We propose a new paradigm for learning models that can synthesize novel views of unseen scenes.
Our method is capable of predicting the color of a target ray in a novel scene directly, just from a collection of patches sampled from the scene.
We show that our approach outperforms the state-of-the-art on novel view synthesis of unseen scenes even when being trained with considerably less data than prior work.
arXiv Detail & Related papers (2022-07-21T17:57:04Z) - ReLU Fields: The Little Non-linearity That Could [62.228229880658404]
We investigate what is the smallest change to grid-based representations that allows for retaining the high fidelity result ofs.
We show that such an approach becomes competitive with the state-of-the-art.
arXiv Detail & Related papers (2022-05-22T13:42:31Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from
Sparse Inputs [79.00855490550367]
We show that NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available.
We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints.
Our model outperforms not only other methods that optimize over a single scene, but also conditional models that are extensively pre-trained on large multi-view datasets.
arXiv Detail & Related papers (2021-12-01T18:59:46Z) - Portrait Neural Radiance Fields from a Single Image [68.66958204066721]
We present a method for estimating Neural Radiance Fields (NeRF) from a single portrait.
We propose to pretrain the weights of a multilayer perceptron (MLP), which implicitly models the volumetric density.
To improve the generalization to unseen faces, we train the canonical coordinate space approximated by 3D face morphable models.
We quantitatively evaluate the method using controlled captures and demonstrate the generalization to real portrait images, showing favorable results against state-of-the-arts.
arXiv Detail & Related papers (2020-12-10T18:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.