Towards a Neural Graphics Pipeline for Controllable Image Generation
- URL: http://arxiv.org/abs/2006.10569v2
- Date: Mon, 22 Feb 2021 09:18:55 GMT
- Title: Towards a Neural Graphics Pipeline for Controllable Image Generation
- Authors: Xuelin Chen, Daniel Cohen-Or, Baoquan Chen and Niloy J. Mitra
- Abstract summary: We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models.
NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation.
We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
- Score: 96.11791992084551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we leverage advances in neural networks towards forming a
neural rendering for controllable image generation, and thereby bypassing the
need for detailed modeling in conventional graphics pipeline. To this end, we
present Neural Graphics Pipeline (NGP), a hybrid generative model that brings
together neural and traditional image formation models. NGP decomposes the
image into a set of interpretable appearance feature maps, uncovering direct
control handles for controllable image generation. To form an image, NGP
generates coarse 3D models that are fed into neural rendering modules to
produce view-specific interpretable 2D maps, which are then composited into the
final output image using a traditional image formation model. Our approach
offers control over image generation by providing direct handles controlling
illumination and camera parameters, in addition to control over shape and
appearance variations. The key challenge is to learn these controls through
unsupervised training that links generated coarse 3D models with unpaired real
images via neural and traditional (e.g., Blinn- Phong) rendering functions,
without establishing an explicit correspondence between them. We demonstrate
the effectiveness of our approach on controllable image generation of
single-object scenes. We evaluate our hybrid modeling framework, compare with
neural-only generation methods (namely, DCGAN, LSGAN, WGAN-GP, VON, and SRNs),
report improvement in FID scores against real images, and demonstrate that NGP
supports direct controls common in traditional forward rendering. Code is
available at http://geometry.cs.ucl.ac.uk/projects/2021/ngp.
Related papers
- PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models [55.080748327139176]
PerlDiff is a method for effective street view image generation that fully leverages perspective 3D geometric information.
Our results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets.
arXiv Detail & Related papers (2024-07-08T16:46:47Z) - Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models.
MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation.
In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z) - Controlling the Output of a Generative Model by Latent Feature Vector
Shifting [0.0]
We present our novel method for latent vector shifting for controlled output image modification.
In our approach we use a pre-trained model of StyleGAN3 that generates images of realistic human faces.
Our latent feature shifter is a neural network model with a task to shift the latent vectors of a generative model into a specified feature direction.
arXiv Detail & Related papers (2023-11-15T10:42:06Z) - Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance
Fields using Geometry-Guided Text-to-Image Diffusion Model [39.64952340472541]
We propose a controllable text-to-3D avatar generation method whose facial expression is controllable.
Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images.
We demonstrate the empirical results and discuss the effectiveness of our method.
arXiv Detail & Related papers (2023-09-07T08:14:46Z) - Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.
Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module.
Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - SMPLpix: Neural Avatars from 3D Human Models [56.85115800735619]
We bridge the gap between classic rendering and the latest generative networks operating in pixel space.
We train a network that directly converts a sparse set of 3D mesh vertices into photorealistic images.
We show the advantage over conventional differentiables both in terms of the level of photorealism and rendering efficiency.
arXiv Detail & Related papers (2020-08-16T10:22:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.