3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- URL: http://arxiv.org/abs/2303.15012v1
- Date: Mon, 27 Mar 2023 08:54:51 GMT
- Title: 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- Authors: Senmao Li, Joost van de Weijer, Yaxing Wang, Fahad Shahbaz Khan,
Meiqin Liu, Jian Yang
- Abstract summary: We investigate 3D-aware GANs for 3D consistent multi-class image-to-image (3D-aware I2I) translation.
We decouple this learning process into a multi-class 3D-aware GAN step and a 3D-aware I2I translation step.
In extensive experiments on two datasets, we successfully perform 3D-aware I2I translation with multi-view consistency.
- Score: 82.27932197385748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in 3D-aware generative models (3D-aware GANs) combined with
Neural Radiance Fields (NeRF) have achieved impressive results. However no
prior works investigate 3D-aware GANs for 3D consistent multi-class
image-to-image (3D-aware I2I) translation. Naively using 2D-I2I translation
methods suffers from unrealistic shape/identity change. To perform 3D-aware
multi-class I2I translation, we decouple this learning process into a
multi-class 3D-aware GAN step and a 3D-aware I2I translation step. In the first
step, we propose two novel techniques: a new conditional architecture and an
effective training strategy. In the second step, based on the well-trained
multi-class 3D-aware GAN architecture, that preserves view-consistency, we
construct a 3D-aware I2I translation system. To further reduce the
view-consistency problems, we propose several new techniques, including a
U-net-like adaptor network design, a hierarchical representation constrain and
a relative regularization loss. In extensive experiments on two datasets,
quantitative and qualitative results demonstrate that we successfully perform
3D-aware I2I translation with multi-view consistency.
Related papers
- MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration [29.657854912416038]
We introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries.
By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects.
We construct a new benchmark of T2I-3DisBench (3D image scene), which describes diverse 3D image scenes with 50 detailed prompts.
arXiv Detail & Related papers (2024-08-20T07:37:23Z) - Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior [52.44678180286886]
2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data.
We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.
arXiv Detail & Related papers (2023-12-11T18:59:18Z) - Multi-CLIP: Contrastive Vision-Language Pre-training for Question
Answering tasks in 3D Scenes [68.61199623705096]
Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore.
We propose a novel 3D pre-training Vision-Language method, namely Multi-CLIP, that enables a model to learn language-grounded and transferable 3D scene point cloud representations.
arXiv Detail & Related papers (2023-06-04T11:08:53Z) - Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation [29.959223778769513]
We propose a novel learning strategy, namely 3D-to-2D imitation, which enables a 3D-aware GAN to generate high-quality images.
We also introduce 3D-aware convolutions into the generator for better 3D representation learning.
With the above strategies, our method reaches FID scores of 5.4 and 4.3 on FFHQ and AFHQ-v2 Cats, respectively, at 512x512 resolution.
arXiv Detail & Related papers (2023-03-16T02:18:41Z) - XDGAN: Multi-Modal 3D Shape Generation in 2D Space [60.46777591995821]
We propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.
The generated geometry images are quick to convert to 3D meshes, enabling real-time 3D object synthesis, visualization and interactive editing.
We show both quantitatively and qualitatively that our method is highly effective at various tasks such as 3D shape generation, single view reconstruction and shape manipulation, while being significantly faster and more flexible compared to recent 3D generative models.
arXiv Detail & Related papers (2022-10-06T15:54:01Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Lifting 2D StyleGAN for 3D-Aware Face Generation [52.8152883980813]
We propose a framework, called LiftedGAN, that disentangles and lifts a pre-trained StyleGAN2 for 3D-aware face generation.
Our model is "3D-aware" in the sense that it is able to (1) disentangle the latent space of StyleGAN2 into texture, shape, viewpoint, lighting and (2) generate 3D components for synthetic images.
arXiv Detail & Related papers (2020-11-26T05:02:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.