DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
- URL: http://arxiv.org/abs/2507.02299v1
- Date: Thu, 03 Jul 2025 04:12:11 GMT
- Title: DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
- Authors: Yunhan Yang, Shuo Chen, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Edmund Y. Lam, Hengshuang Zhao, Tong He, Xihui Liu,
- Abstract summary: Recent advancements in leveraging pre-trained 2D diffusion models achieve the generation of high-quality novel views from a single in-the-wild image.<n>However, existing works face challenges in producing controllable novel views due to the lack of information from multiple views.<n>We present DreamComposer++, a flexible and scalable framework designed to improve current view-aware diffusion models by incorporating multi-view conditions.
- Score: 41.766792225000486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in leveraging pre-trained 2D diffusion models achieve the generation of high-quality novel views from a single in-the-wild image. However, existing works face challenges in producing controllable novel views due to the lack of information from multiple views. In this paper, we present DreamComposer++, a flexible and scalable framework designed to improve current view-aware diffusion models by incorporating multi-view conditions. Specifically, DreamComposer++ utilizes a view-aware 3D lifting module to extract 3D representations of an object from various views. These representations are then aggregated and rendered into the latent features of target view through the multi-view feature fusion module. Finally, the obtained features of target view are integrated into pre-trained image or video diffusion models for novel view synthesis. Experimental results demonstrate that DreamComposer++ seamlessly integrates with cutting-edge view-aware diffusion models and enhances their abilities to generate controllable novel views from multi-view conditions. This advancement facilitates controllable 3D object reconstruction and enables a wide range of applications.
Related papers
- Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.<n>We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.<n>In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation [14.064983137553353]
We aim to enhance the quality and functionality of generative diffusion models for the task of creating controllable, photorealistic human avatars.
We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach.
Our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars.
arXiv Detail & Related papers (2024-01-09T18:59:04Z) - DreamComposer: Controllable 3D Object Generation via Multi-View Conditions [45.4321454586475]
Recent works are capable of generating high-quality novel views from a single in-the-wild image.
Due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.
We present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions.
arXiv Detail & Related papers (2023-12-06T16:55:53Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.