AE-NeRF: Auto-Encoding Neural Radiance Fields for 3D-Aware Object
Manipulation
- URL: http://arxiv.org/abs/2204.13426v1
- Date: Thu, 28 Apr 2022 11:50:18 GMT
- Title: AE-NeRF: Auto-Encoding Neural Radiance Fields for 3D-Aware Object
Manipulation
- Authors: Mira Kim, Jaehoon Ko, Kyusun Cho, Junmyeong Choi, Daewon Choi,
Seungryong Kim
- Abstract summary: We propose a novel framework for 3D-aware object manipulation, called Auto-aware Neural Radiance Fields (AE-NeRF)
Our model is formulated in an auto-encoder architecture, extracts disentangled 3D attributes such as 3D shape, appearance, and camera pose from an image.
A high-quality image is rendered from the attributes through disentangled generative Neural Radiance Fields (NeRF)
- Score: 24.65896451569795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel framework for 3D-aware object manipulation, called
Auto-Encoding Neural Radiance Fields (AE-NeRF). Our model, which is formulated
in an auto-encoder architecture, extracts disentangled 3D attributes such as 3D
shape, appearance, and camera pose from an image, and a high-quality image is
rendered from the attributes through disentangled generative Neural Radiance
Fields (NeRF). To improve the disentanglement ability, we present two losses,
global-local attribute consistency loss defined between input and output, and
swapped-attribute classification loss. Since training such auto-encoding
networks from scratch without ground-truth shape and appearance information is
non-trivial, we present a stage-wise training scheme, which dramatically helps
to boost the performance. We conduct experiments to demonstrate the
effectiveness of the proposed model over the latest methods and provide
extensive ablation studies.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - Generative Deformable Radiance Fields for Disentangled Image Synthesis
of Topology-Varying Objects [52.46838926521572]
3D-aware generative models have demonstrated their superb performance to generate 3D neural radiance fields (NeRF) from a collection of monocular 2D images.
We propose a generative model for synthesizing radiance fields of topology-varying objects with disentangled shape and appearance variations.
arXiv Detail & Related papers (2022-09-09T08:44:06Z) - Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.
Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module.
Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z) - AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural
Images with Aperture Rendering Neural Radiance Fields [23.92262483956057]
Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection.
We propose an aperture rendering NeRF (AR-NeRF) which can utilize viewpoint and defocus cues in a unified manner.
We demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.
arXiv Detail & Related papers (2022-06-13T12:41:59Z) - Points2NeRF: Generating Neural Radiance Fields from 3D point cloud [0.0]
We propose representing 3D objects as Neural Radiance Fields (NeRFs)
We leverage a hypernetwork paradigm and train the model to take a 3D point cloud with the associated color values.
Our method provides efficient 3D object representation and offers several advantages over the existing approaches.
arXiv Detail & Related papers (2022-06-02T20:23:33Z) - 3D-aware Image Synthesis via Learning Structural and Textural
Representations [39.681030539374994]
We propose VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation.
Our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
arXiv Detail & Related papers (2021-12-20T18:59:40Z) - Using Adaptive Gradient for Texture Learning in Single-View 3D
Reconstruction [0.0]
Learning-based approaches for 3D model reconstruction have attracted attention owing to its modern applications.
We present a novel sampling algorithm by optimizing the gradient of predicted coordinates based on the variance on the sampling image.
We also adopt Frechet Inception Distance (FID) to form a loss function in learning, which helps bridging the gap between rendered images and input images.
arXiv Detail & Related papers (2021-04-29T07:52:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.