Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation
- URL: http://arxiv.org/abs/2306.17115v2
- Date: Mon, 3 Jul 2023 12:29:03 GMT
- Title: Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation
- Authors: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin
Fu, Tao Chen, Gang Yu and Shenghua Gao
- Abstract summary: We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts.
Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
- Score: 47.945556996219295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel alignment-before-generation approach to tackle the
challenging task of generating general 3D shapes based on 2D images or texts.
Directly learning a conditional generative model from images or texts to 3D
shapes is prone to producing inconsistent results with the conditions because
3D shapes have an additional dimension whose distribution significantly differs
from that of 2D images and texts. To bridge the domain gap among the three
modalities and facilitate multi-modal-conditioned 3D shape generation, we
explore representing 3D shapes in a shape-image-text-aligned space. Our
framework comprises two models: a Shape-Image-Text-Aligned Variational
Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model
(ASLDM). The former model encodes the 3D shapes into the shape latent space
aligned to the image and text and reconstructs the fine-grained 3D neural
fields corresponding to given shape embeddings via the transformer-based
decoder. The latter model learns a probabilistic mapping function from the
image or text space to the latent shape space. Our extensive experiments
demonstrate that our proposed approach can generate higher-quality and more
diverse 3D shapes that better semantically conform to the visual or textural
conditional inputs, validating the effectiveness of the
shape-image-text-aligned space for cross-modality 3D shape generation.
Related papers
- NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.
Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation.
We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z) - Explorable Mesh Deformation Subspaces from Unstructured Generative
Models [53.23510438769862]
Deep generative models of 3D shapes often feature continuous latent spaces that can be used to explore potential variations.
We present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model.
arXiv Detail & Related papers (2023-10-11T18:53:57Z) - Locally Attentional SDF Diffusion for Controllable 3D Shape Generation [24.83724829092307]
We propose a diffusion-based 3D generation framework, to model plausible 3D shapes, via 2D sketch image input.
Our method is built on a two-stage diffusion model. The first stage, named occupancy-diffusion, aims to generate a low-resolution occupancy field to approximate the shape shell.
The second stage, named SDF-diffusion, synthesizes a high-resolution signed distance field within the occupied voxels determined by the first stage to extract fine geometry.
arXiv Detail & Related papers (2023-05-08T05:07:23Z) - 3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models [8.583859530633417]
We propose a diffusion model for neural implicit representations of 3D shapes that operates in the latent space of an auto-decoder.
This allows us to generate diverse and high quality 3D surfaces.
arXiv Detail & Related papers (2022-12-01T20:00:00Z) - Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori.
Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry.
This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Deformed Implicit Field: Modeling 3D Shapes with Learned Dense
Correspondence [30.849927968528238]
We propose a novel Deformed Implicit Field representation for modeling 3D shapes of a category.
Our neural network, dubbed DIF-Net, jointly learns a shape latent space and these fields for 3D objects belonging to a category.
Experiments show that DIF-Net not only produces high-fidelity 3D shapes but also builds high-quality dense correspondences across different shapes.
arXiv Detail & Related papers (2020-11-27T10:45:26Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.