IC3D: Image-Conditioned 3D Diffusion for Shape Generation
- URL: http://arxiv.org/abs/2211.10865v3
- Date: Wed, 13 Sep 2023 12:37:51 GMT
- Title: IC3D: Image-Conditioned 3D Diffusion for Shape Generation
- Authors: Cristian Sbrolli, Paolo Cudrano, Matteo Frosi, Matteo Matteucci
- Abstract summary: Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated exceptional performance in various 2D generative tasks.
We introduce CISP (Contrastive Image-Shape Pre-training), obtaining a well-structured image-shape joint embedding space.
We then introduce IC3D, a DDPM that harnesses CISP's guidance for 3D shape generation from single-view images.
- Score: 4.470499157873342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Denoising Diffusion Probabilistic Models (DDPMs) have
demonstrated exceptional performance in various 2D generative tasks. Following
this success, DDPMs have been extended to 3D shape generation, surpassing
previous methodologies in this domain. While many of these models are
unconditional, some have explored the potential of using guidance from
different modalities. In particular, image guidance for 3D generation has been
explored through the utilization of CLIP embeddings. However, these embeddings
are designed to align images and text, and do not necessarily capture the
specific details needed for shape generation. To address this limitation and
enhance image-guided 3D DDPMs with augmented 3D understanding, we introduce
CISP (Contrastive Image-Shape Pre-training), obtaining a well-structured
image-shape joint embedding space. Building upon CISP, we then introduce IC3D,
a DDPM that harnesses CISP's guidance for 3D shape generation from single-view
images. This generative diffusion model outperforms existing benchmarks in both
quality and diversity of generated 3D shapes. Moreover, despite IC3D's
generative nature, its generated shapes are preferred by human evaluators over
a competitive single-view 3D reconstruction model. These properties contribute
to a coherent embedding space, enabling latent interpolation and conditioned
generation also from out-of-distribution images. We find IC3D able to generate
coherent and diverse completions also when presented with occluded views,
rendering it applicable in controlled real-world scenarios.
Related papers
- GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image [64.94932577552458]
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.
Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data.
We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
arXiv Detail & Related papers (2024-05-24T15:09:12Z) - NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.
Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation.
We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z) - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D
Diffusion? [5.0243930429558885]
This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images.
CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space.
We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images.
arXiv Detail & Related papers (2024-02-02T09:09:23Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - 3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models [8.583859530633417]
We propose a diffusion model for neural implicit representations of 3D shapes that operates in the latent space of an auto-decoder.
This allows us to generate diverse and high quality 3D surfaces.
arXiv Detail & Related papers (2022-12-01T20:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.