Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts
- URL: http://arxiv.org/abs/2308.11793v1
- Date: Tue, 22 Aug 2023 21:18:54 GMT
- Title: Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts
- Authors: Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen,
Mukund Varma, Yi Wang, Zhangyang Wang
- Abstract summary: Cross-scene generalizable NeRF models have become a new spotlight of the NeRF field.
We bridge "neuralized" architectures with the powerful Mixture-of-Experts (MoE) idea from large language models.
Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes.
- Score: 88.23732496104667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-scene generalizable NeRF models, which can directly synthesize novel
views of unseen scenes, have become a new spotlight of the NeRF field. Several
existing attempts rely on increasingly end-to-end "neuralized" architectures,
i.e., replacing scene representation and/or rendering modules with performant
neural networks such as transformers, and turning novel view synthesis into a
feed-forward inference pipeline. While those feedforward "neuralized"
architectures still do not fit diverse scenes well out of the box, we propose
to bridge them with the powerful Mixture-of-Experts (MoE) idea from large
language models (LLMs), which has demonstrated superior generalization ability
by balancing between larger overall model capacity and flexible per-instance
specialization. Starting from a recent generalizable NeRF architecture called
GNT, we first demonstrate that MoE can be neatly plugged in to enhance the
model. We further customize a shared permanent expert and a geometry-aware
consistency loss to enforce cross-scene consistency and spatial smoothness
respectively, which are essential for generalizable view synthesis. Our
proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has
experimentally shown state-of-the-art results when transferring to unseen
scenes, indicating remarkably better cross-scene generalization in both
zero-shot and few-shot settings. Our codes are available at
https://github.com/VITA-Group/GNT-MOVE.
Related papers
- Novel View Synthesis with Pixel-Space Diffusion Models [4.844800099745365]
generative models are being increasingly employed in novel view synthesis (NVS)
We adapt a modern diffusion model architecture for end-to-end NVS in the pixel space.
We introduce a novel NVS training scheme that utilizes single-view datasets, capitalizing on their relative abundance.
arXiv Detail & Related papers (2024-11-12T12:58:33Z) - On the Adversarial Transferability of Generalized "Skip Connections" [83.71752155227888]
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful.
We find that using more gradients from the skip connections rather than the residual modules during backpropagation allows one to craft adversarial examples with high transferability.
We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models.
arXiv Detail & Related papers (2024-10-11T16:17:47Z) - InterNeRF: Scaling Radiance Fields via Parameter Interpolation [36.014610797521605]
We propose InterNeRF, a novel architecture for rendering a target view using a subset of the model's parameters.
We demonstrate significant improvements in multi-room scenes while remaining competitive on standard benchmarks.
arXiv Detail & Related papers (2024-06-17T16:55:22Z) - GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model.
Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness.
Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z) - GiT: Towards Generalist Vision Transformer through Universal Language Interface [94.33443158125186]
This paper proposes a simple, yet effective framework, called GiT, simultaneously applicable for various vision tasks only with a vanilla ViT.
GiT is a multi-task visual model, jointly trained across five representative benchmarks without task-specific fine-tuning.
arXiv Detail & Related papers (2024-03-14T13:47:41Z) - MuRF: Multi-Baseline Radiance Fields [117.55811938988256]
We present Multi-Baseline Radiance Fields (MuRF), a feed-forward approach to solving sparse view synthesis.
MuRF achieves state-of-the-art performance across multiple different baseline settings.
We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset.
arXiv Detail & Related papers (2023-12-07T18:59:56Z) - Mask-Based Modeling for Neural Radiance Fields [20.728248301818912]
In this work, we unveil that 3D implicit representation learning can be significantly improved by mask-based modeling.
We propose MRVM-NeRF, which is a self-supervised pretraining target to predict complete scene representations from partially masked features along each ray.
With this pretraining target, MRVM-NeRF enables better use of correlations across different points and views as the geometry priors.
arXiv Detail & Related papers (2023-04-11T04:12:31Z) - Is Attention All NeRF Needs? [103.51023982774599]
Generalizable NeRF Transformer (GNT) is a pure, unified transformer-based architecture that efficiently reconstructs Neural Radiance Fields (NeRFs) on the fly from source views.
GNT achieves generalizable neural scene representation and rendering, by encapsulating two transformer-based stages.
arXiv Detail & Related papers (2022-07-27T05:09:54Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.