SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
- URL: http://arxiv.org/abs/2212.04493v2
- Date: Wed, 22 Mar 2023 00:30:56 GMT
- Title: SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
- Authors: Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing and
Liangyan Gui
- Abstract summary: We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
- Score: 89.47132156950194
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we present a novel framework built to simplify 3D asset
generation for amateur users. To enable interactive generation, our method
supports a variety of input modalities that can be easily provided by a human,
including images, text, partially observed shapes and combinations of these,
further allowing to adjust the strength of each input. At the core of our
approach is an encoder-decoder, compressing 3D shapes into a compact latent
representation, upon which a diffusion model is learned. To enable a variety of
multi-modal inputs, we employ task-specific encoders with dropout followed by a
cross-attention mechanism. Due to its flexibility, our model naturally supports
a variety of tasks, outperforming prior works on shape completion, image-based
3D reconstruction, and text-to-3D. Most interestingly, our model can combine
all these tasks into one swiss-army-knife tool, enabling the user to perform
shape generation using incomplete shapes, images, and textual descriptions at
the same time, providing the relative weights for each input and facilitating
interactivity. Despite our approach being shape-only, we further show an
efficient method to texture the generated shape using large-scale text-to-image
models.
Related papers
- CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization [27.55341255800119]
We present CharacterGen, a framework developed to efficiently generate 3D characters.
A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach.
We have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model.
arXiv Detail & Related papers (2024-02-27T05:10:59Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation [51.19871052619077]
We introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.
We maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
arXiv Detail & Related papers (2024-02-07T17:57:03Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation [14.064983137553353]
We aim to enhance the quality and functionality of generative diffusion models for the task of creating controllable, photorealistic human avatars.
We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach.
Our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars.
arXiv Detail & Related papers (2024-01-09T18:59:04Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - Learning Generative Models of Shape Handles [43.41382075567803]
We present a generative model to synthesize 3D shapes as sets of handles.
Our model can generate handle sets with varying cardinality and different types of handles.
We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art.
arXiv Detail & Related papers (2020-04-06T22:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.