Related papers: StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

URL: http://arxiv.org/abs/2110.08985v1
Date: Mon, 18 Oct 2021 02:37:01 GMT
Title: StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis
Authors: Jiatao Gu, Lingjie Liu, Peng Wang and Christian Theobalt
Abstract summary: StyleNeRF is a 3D-aware generative model for high-resolution image synthesis with high multi-view consistency. It integrates the neural radiance field (NeRF) into a style-based generator. It can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality.
Score: 92.25145204543904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.

Related papers

SuperNeRF-GAN: A Universal 3D-Consistent Super-Resolution Framework for Efficient and Enhanced 3D-Aware Image Synthesis [59.73403876485574]
We propose SuperNeRF-GAN, a universal framework for 3D-consistent super-resolution. A key highlight of SuperNeRF-GAN is its seamless integration with NeRF-based 3D-aware image synthesis methods. Experimental results demonstrate the superior efficiency, 3D-consistency, and quality of our approach.
arXiv Detail & Related papers (2025-01-12T10:31:33Z)
LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors [107.83398512719981]
Single-image 3D reconstruction remains a fundamental challenge in computer vision. Recent advances in Latent Video Diffusion Models offer promising 3D priors learned from large-scale video data. We propose LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency.
arXiv Detail & Related papers (2024-12-12T18:58:42Z)
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture. We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z)
SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent. Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z)
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields [29.573344213110172]
We propose a framework called Omni-Recon, which is capable of (1) generalizable 3D reconstruction and zero-shot multitask scene understanding, and (2) adaptability to diverse downstream 3D applications such as real-time rendering and scene editing. Specifically, our Omni-Recon features a general-purpose NeRF model using image-based rendering with two decoupled branches. This design achieves state-of-the-art (SOTA) generalizable 3D surface reconstruction quality with blending weights reusable across diverse tasks for zero-shot multitask scene understanding.
arXiv Detail & Related papers (2024-03-17T07:47:26Z)
What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs [82.3936309001633]
3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries. Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with post-processing 2D super resolution. We propose techniques to scale neural volume rendering to the much higher resolution of native 2D images, thereby resolving fine-grained 3D geometry with unprecedented detail.
arXiv Detail & Related papers (2024-01-04T18:50:38Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds [28.660893916203747]
This paper proposes a novel 3D-aware GAN that can generate high resolution images (up to 1024X1024) while keeping strict 3D consistency as in volume rendering. Our motivation is to achieve super-resolution directly in the 3D space to preserve 3D consistency. Experiments on FFHQ and AFHQv2 datasets show that our method can produce high-quality 3D-consistent results.
arXiv Detail & Related papers (2022-06-15T02:35:51Z)
3D-aware Image Synthesis via Learning Structural and Textural Representations [39.681030539374994]
We propose VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation. Our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
arXiv Detail & Related papers (2021-12-20T18:59:40Z)
Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
Towards Realistic 3D Embedding via View Alignment [53.89445873577063]
This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable.
arXiv Detail & Related papers (2020-07-14T14:45:00Z)
GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis [43.4859484191223]
We propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone.
arXiv Detail & Related papers (2020-07-05T20:37:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.