VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
- URL: http://arxiv.org/abs/2206.07695v2
- Date: Fri, 17 Jun 2022 15:24:00 GMT
- Title: VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
- Authors: Katja Schwarz and Axel Sauer and Michael Niemeyer and Yiyi Liao and
Andreas Geiger
- Abstract summary: State-of-the-art 3D-aware generative models rely on coordinate-based parameterize 3D radiance fields.
Existing approaches often render low-resolution feature maps and process them with an upsampling network to obtain the final image.
In contrast to existing approaches, our method requires only a single forward pass to generate a full 3D scene.
- Score: 42.74658047803192
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art 3D-aware generative models rely on coordinate-based MLPs to
parameterize 3D radiance fields. While demonstrating impressive results,
querying an MLP for every sample along each ray leads to slow rendering.
Therefore, existing approaches often render low-resolution feature maps and
process them with an upsampling network to obtain the final image. Albeit
efficient, neural rendering often entangles viewpoint and content such that
changing the camera pose results in unwanted changes of geometry or appearance.
Motivated by recent results in voxel-based novel view synthesis, we investigate
the utility of sparse voxel grid representations for fast and 3D-consistent
generative modeling in this paper. Our results demonstrate that monolithic MLPs
can indeed be replaced by 3D convolutions when combining sparse voxel grids
with progressive growing, free space pruning and appropriate regularization. To
obtain a compact representation of the scene and allow for scaling to higher
voxel resolutions, our model disentangles the foreground object (modeled in 3D)
from the background (modeled in 2D). In contrast to existing approaches, our
method requires only a single forward pass to generate a full 3D scene. It
hence allows for efficient rendering from arbitrary viewpoints while yielding
3D consistent results with high visual fidelity.
Related papers
- F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets.
We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting.
We also introduce a self-supervised cycle-consistent constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z) - Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image [28.759158325097093]
Unique3D is a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images.
Our framework features state-of-the-art generation fidelity and strong generalizability.
arXiv Detail & Related papers (2024-05-30T17:59:54Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware
Image Synthesis [163.96778522283967]
We propose a shading-guided generative implicit model that is able to learn a starkly improved shape representation.
An accurate 3D shape should also yield a realistic rendering under different lighting conditions.
Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis.
arXiv Detail & Related papers (2021-10-29T10:53:12Z) - PixelSynth: Generating a 3D-Consistent Experience from a Single Image [30.64117903216323]
We present an approach that fuses 3D reasoning with autoregressive modeling to outpaint large view changes in a 3D-consistent manner.
We demonstrate considerable improvement in single image large-angle view synthesis results compared to a variety of methods and possible variants.
arXiv Detail & Related papers (2021-08-12T17:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.