3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models
- URL: http://arxiv.org/abs/2311.05464v1
- Date: Thu, 9 Nov 2023 15:51:27 GMT
- Title: 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models
- Authors: Haibo Yang and Yang Chen and Yingwei Pan and Ting Yao and Zhineng Chen
and Tao Mei
- Abstract summary: 3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
- Score: 102.75875255071246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D content creation via text-driven stylization has played a fundamental
challenge to multimedia and graphics community. Recent advances of cross-modal
foundation models (e.g., CLIP) have made this problem feasible. Those
approaches commonly leverage CLIP to align the holistic semantics of stylized
mesh with the given text prompt. Nevertheless, it is not trivial to enable more
controllable stylization of fine-grained details in 3D meshes solely based on
such semantic-level cross-modal supervision. In this work, we propose a new
3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes
with additional controllable appearance and geometric guidance from 2D
Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the
texture of 3D mesh into reflectance properties and scene lighting using
implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is
achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a
pre-trained controllable 2D Diffusion model to guide the learning of rendered
images, encouraging the synthesized image of each view semantically aligned
with text prompt and geometrically consistent with depth map. This way
elegantly integrates both image rendering via implicit MLP networks and
diffusion process of image synthesis in an end-to-end fashion, enabling a
high-quality fine-grained stylization of 3D meshes. We also build a new dataset
derived from Objaverse and the evaluation protocol for this task. Through both
qualitative and quantitative experiments, we validate the capability of our
3DStyle-Diffusion. Source code and data are available at
\url{https://github.com/yanghb22-fdu/3DStyle-Diffusion-Official}.
Related papers
- Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input.
Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks [101.36230756743106]
This paper is inspired by the success of 3D-aware GANs that bridge 2D and 3D domains with 3D fields as the intermediate representation for rendering 2D images.
We propose a novel method, dubbed HyperStyle3D, based on 3D-aware GANs for 3D portrait stylization.
arXiv Detail & Related papers (2023-04-19T07:22:05Z) - Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.