Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion
- URL: http://arxiv.org/abs/2211.11674v2
- Date: Mon, 20 Mar 2023 11:33:18 GMT
- Title: Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion
- Authors: Dario Pavllo, David Joseph Tan, Marie-Julie Rakotosaona, Federico
Tombari
- Abstract summary: We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
- Score: 54.151979979158085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Radiance Fields (NeRF) coupled with GANs represent a promising
direction in the area of 3D reconstruction from a single view, owing to their
ability to efficiently model arbitrary topologies. Recent work in this area,
however, has mostly focused on synthetic datasets where exact ground-truth
poses are known, and has overlooked pose estimation, which is important for
certain downstream applications such as augmented reality (AR) and robotics. We
introduce a principled end-to-end reconstruction framework for natural images,
where accurate ground-truth poses are not available. Our approach recovers an
SDF-parameterized 3D shape, pose, and appearance from a single image of an
object, without exploiting multiple views during training. More specifically,
we leverage an unconditional 3D-aware generator, to which we apply a hybrid
inversion scheme where a model produces a first guess of the solution which is
then refined via optimization. Our framework can de-render an image in as few
as 10 steps, enabling its use in practical scenarios. We demonstrate
state-of-the-art results on a variety of real and synthetic benchmarks.
Related papers
- SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization [16.460851701725392]
We present a novel approach that optimize radiance fields with scene graphs to mitigate the influence of outlier poses.
Our method incorporates an adaptive inlier-outlier confidence estimation scheme based on scene graphs.
We also introduce an effective intersection-over-union (IoU) loss to optimize the camera pose and surface geometry.
arXiv Detail & Related papers (2024-07-17T15:50:17Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - Zero-1-to-3: Zero-shot One Image to 3D Object [30.455300183998247]
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint.
Our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
arXiv Detail & Related papers (2023-03-20T17:59:50Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - ERF: Explicit Radiance Field Reconstruction From Scratch [12.254150867994163]
We propose a novel explicit dense 3D reconstruction approach that processes a set of images of a scene with sensor poses and calibrations and estimates a photo-real digital model.
One of the key innovations is that the underlying volumetric representation is completely explicit.
We show that our method is general and practical. It does not require a highly controlled lab setup for capturing, but allows for reconstructing scenes with a vast variety of objects.
arXiv Detail & Related papers (2022-02-28T19:37:12Z) - AvatarMe++: Facial Shape and BRDF Inference with Photorealistic
Rendering-Aware GANs [119.23922747230193]
We introduce the first method that is able to reconstruct render-ready 3D facial geometry and BRDF from a single "in-the-wild" image.
Our method outperforms the existing arts by a significant margin and reconstructs high-resolution 3D faces from a single low-resolution image.
arXiv Detail & Related papers (2021-12-11T11:36:30Z) - Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR)
GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images.
Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z) - GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis [43.4859484191223]
We propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene.
By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone.
arXiv Detail & Related papers (2020-07-05T20:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.