Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation
- URL: http://arxiv.org/abs/2404.17419v1
- Date: Fri, 26 Apr 2024 13:55:39 GMT
- Title: Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation
- Authors: Seungwook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang,
- Abstract summary: We build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt.
Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation.
- Score: 48.595946437886774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.
Related papers
- Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention [54.66152436050373]
We propose a Multi-view Large Reconstruction Model (M-LRM) to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner.
Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images.
Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity.
arXiv Detail & Related papers (2024-06-11T18:29:13Z) - Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - 3D-aware Image Generation and Editing with Multi-modal Conditions [6.444512435220748]
3D-consistent image generation from a single 2D semantic label is an important and challenging research topic in computer graphics and computer vision.
We propose a novel end-to-end 3D-aware image generation and editing model incorporating multiple types of conditional inputs.
Our method can generate diverse images with distinct noises, edit the attribute through a text description and conduct style transfer by giving a reference RGB image.
arXiv Detail & Related papers (2024-03-11T07:10:37Z) - LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation [51.19871052619077]
We introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.
We maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
arXiv Detail & Related papers (2024-02-07T17:57:03Z) - SyncDreamer: Generating Multiview-consistent Images from a Single-view Image [59.75474518708409]
A novel diffusion model called SyncDreamer generates multiview-consistent images from a single-view image.
Experiments show that SyncDreamer generates images with high consistency across different views.
arXiv Detail & Related papers (2023-09-07T02:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.