Transforming Radiance Field with Lipschitz Network for Photorealistic 3D
Scene Stylization
- URL: http://arxiv.org/abs/2303.13232v1
- Date: Thu, 23 Mar 2023 13:05:57 GMT
- Title: Transforming Radiance Field with Lipschitz Network for Photorealistic 3D
Scene Stylization
- Authors: Zicheng Zhang, Yinglu Liu, Congying Han, Yingwei Pan, Tiande Guo, Ting
Yao
- Abstract summary: LipRF is a framework for transforming appearance representation of a pre-trained NeRF with Lipschitz mapping.
We conduct extensive experiments to show the high quality and robust performance of LipRF on both photorealistic 3D stylization and object appearance editing.
- Score: 56.94435595045005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in 3D scene representation and novel view synthesis have
witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not
trivial to exploit NeRF for the photorealistic 3D scene stylization task, which
aims to generate visually consistent and photorealistic stylized scenes from
novel views. Simply coupling NeRF with photorealistic style transfer (PST) will
result in cross-view inconsistency and degradation of stylized view syntheses.
Through a thorough analysis, we demonstrate that this non-trivial task can be
simplified in a new light: When transforming the appearance representation of a
pre-trained NeRF with Lipschitz mapping, the consistency and photorealism
across source views will be seamlessly encoded into the syntheses. That
motivates us to build a concise and flexible learning framework namely LipRF,
which upgrades arbitrary 2D PST methods with Lipschitz mapping tailored for the
3D scene. Technically, LipRF first pre-trains a radiance field to reconstruct
the 3D scene, and then emulates the style on each view by 2D PST as the prior
to learn a Lipschitz network to stylize the pre-trained appearance. In view of
that Lipschitz condition highly impacts the expressivity of the neural network,
we devise an adaptive regularization to balance the reconstruction and
stylization. A gradual gradient aggregation strategy is further introduced to
optimize LipRF in a cost-efficient manner. We conduct extensive experiments to
show the high quality and robust performance of LipRF on both photorealistic 3D
stylization and object appearance editing.
Related papers
- Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - ReconFusion: 3D Reconstruction with Diffusion Priors [104.73604630145847]
We present ReconFusion to reconstruct real-world scenes using only a few photos.
Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets.
Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions.
arXiv Detail & Related papers (2023-12-05T18:59:58Z) - Improving Neural Radiance Fields with Depth-aware Optimization for Novel
View Synthesis [12.3338393483795]
We propose SfMNeRF, a method to better synthesize novel views as well as reconstruct the 3D-scene geometry.
SfMNeRF employs the epipolar, photometric consistency, depth smoothness, and position-of-matches constraints to explicitly reconstruct the 3D-scene structure.
Experiments on two public datasets demonstrate that SfMNeRF surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-11T13:37:17Z) - SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance
Fields [19.740018132105757]
SceneRF is a self-supervised monocular scene reconstruction method using only posed image sequences for training.
At inference, a single input image suffices to hallucinate novel depth views which are fused together to obtain 3D scene reconstruction.
arXiv Detail & Related papers (2022-12-05T18:59:57Z) - TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting
Decomposition [39.312567993736025]
We propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner.
We show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes.
arXiv Detail & Related papers (2022-10-20T13:52:18Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face
Reconstruction [76.1612334630256]
We harness the power of Generative Adversarial Networks (GANs) and Deep Convolutional Neural Networks (DCNNs) to reconstruct the facial texture and shape from single images.
We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, facial texture reconstruction with high-frequency details.
arXiv Detail & Related papers (2021-05-16T16:35:44Z) - BARF: Bundle-Adjusting Neural Radiance Fields [104.97810696435766]
We propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect camera poses.
BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time.
This enables view synthesis and localization of video sequences from unknown camera poses, opening up new avenues for visual localization systems.
arXiv Detail & Related papers (2021-04-13T17:59:51Z) - pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images.
We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects.
In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.