Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D
Diffusion?
- URL: http://arxiv.org/abs/2402.01241v1
- Date: Fri, 2 Feb 2024 09:09:23 GMT
- Title: Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D
Diffusion?
- Authors: Cristian Sbrolli, Paolo Cudrano, Matteo Matteucci
- Abstract summary: This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images.
CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space.
We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images.
- Score: 5.0243930429558885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in deep generative models, particularly with the
application of CLIP (Contrastive Language Image Pretraining) to Denoising
Diffusion Probabilistic Models (DDPMs), have demonstrated remarkable
effectiveness in text to image generation. The well structured embedding space
of CLIP has also been extended to image to shape generation with DDPMs,
yielding notable results. Despite these successes, some fundamental questions
arise: Does CLIP ensure the best results in shape generation from images? Can
we leverage conditioning to bring explicit 3D knowledge into the generative
process and obtain better quality? This study introduces CISP (Contrastive
Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D
images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D
shapes in a shared embedding space, specifically capturing 3D characteristics
potentially overlooked by CLIP's text image focus. Our comprehensive analysis
assesses CISP's guidance performance against CLIP guided models, focusing on
generation quality, diversity, and coherence of the produced shapes with the
conditioning image. We find that, while matching CLIP in generation quality and
diversity, CISP substantially improves coherence with input images,
underscoring the value of incorporating 3D knowledge into generative models.
These findings suggest a promising direction for advancing the synthesis of 3D
visual content by integrating multimodal systems with 3D representations.
Related papers
- Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone.
It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data.
Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z) - IC3D: Image-Conditioned 3D Diffusion for Shape Generation [4.470499157873342]
Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated exceptional performance in various 2D generative tasks.
We introduce CISP (Contrastive Image-Shape Pre-training), obtaining a well-structured image-shape joint embedding space.
We then introduce IC3D, a DDPM that harnesses CISP's guidance for 3D shape generation from single-view images.
arXiv Detail & Related papers (2022-11-20T04:21:42Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Multi-View Consistent Generative Adversarial Networks for 3D-aware Image
Synthesis [48.33860286920389]
3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation.
Existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images.
We propose Multi-View Consistent Generative Adrial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.
arXiv Detail & Related papers (2022-04-13T11:23:09Z) - Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent.
In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations.
We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.