SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
- URL: http://arxiv.org/abs/2212.04493v2
- Date: Wed, 22 Mar 2023 00:30:56 GMT
- Title: SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
- Authors: Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing and
Liangyan Gui
- Abstract summary: We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
- Score: 89.47132156950194
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we present a novel framework built to simplify 3D asset
generation for amateur users. To enable interactive generation, our method
supports a variety of input modalities that can be easily provided by a human,
including images, text, partially observed shapes and combinations of these,
further allowing to adjust the strength of each input. At the core of our
approach is an encoder-decoder, compressing 3D shapes into a compact latent
representation, upon which a diffusion model is learned. To enable a variety of
multi-modal inputs, we employ task-specific encoders with dropout followed by a
cross-attention mechanism. Due to its flexibility, our model naturally supports
a variety of tasks, outperforming prior works on shape completion, image-based
3D reconstruction, and text-to-3D. Most interestingly, our model can combine
all these tasks into one swiss-army-knife tool, enabling the user to perform
shape generation using incomplete shapes, images, and textual descriptions at
the same time, providing the relative weights for each input and facilitating
interactivity. Despite our approach being shape-only, we further show an
efficient method to texture the generated shape using large-scale text-to-image
models.
Related papers
- Any-to-3D Generation via Hybrid Diffusion Supervision [67.54197818071464]
XBind is a unified framework for any-to-3D generation using cross-modal pre-alignment techniques.
XBind integrates an multimodal-aligned encoder with pre-trained diffusion models to generate 3D objects from any modalities.
arXiv Detail & Related papers (2024-11-22T03:52:37Z) - GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - StdGEN: Semantic-Decomposed 3D Character Generation from Single Images [28.302030751098354]
StdGEN is an innovative pipeline for generating semantically high-quality 3D characters from single images.
It generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes.
StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications.
arXiv Detail & Related papers (2024-11-08T17:54:18Z) - CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization [27.55341255800119]
We present CharacterGen, a framework developed to efficiently generate 3D characters.
A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach.
We have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model.
arXiv Detail & Related papers (2024-02-27T05:10:59Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation [14.064983137553353]
We aim to enhance the quality and functionality of generative diffusion models for the task of creating controllable, photorealistic human avatars.
We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach.
Our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars.
arXiv Detail & Related papers (2024-01-09T18:59:04Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - Learning Generative Models of Shape Handles [43.41382075567803]
We present a generative model to synthesize 3D shapes as sets of handles.
Our model can generate handle sets with varying cardinality and different types of handles.
We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art.
arXiv Detail & Related papers (2020-04-06T22:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.