MVControl: Adding Conditional Control to Multi-view Diffusion for
Controllable Text-to-3D Generation
- URL: http://arxiv.org/abs/2311.14494v2
- Date: Mon, 27 Nov 2023 12:39:42 GMT
- Title: MVControl: Adding Conditional Control to Multi-view Diffusion for
Controllable Text-to-3D Generation
- Authors: Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
- Abstract summary: We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models.
Our approach enables the generation of controllable multi-view images and view-consistent 3D content.
- Score: 10.250715657201363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MVControl, a novel neural network architecture that enhances
existing pre-trained multi-view 2D diffusion models by incorporating additional
input conditions, e.g. edge maps. Our approach enables the generation of
controllable multi-view images and view-consistent 3D content. To achieve
controllable multi-view image generation, we leverage MVDream as our base
model, and train a new neural network module as additional plugin for
end-to-end task-specific condition learning. To precisely control the shapes
and views of generated images, we innovatively propose a new conditioning
mechanism that predicts an embedding encapsulating the input spatial and view
conditions, which is then injected to the network globally. Once MVControl is
trained, score-distillation (SDS) loss based optimization can be performed to
generate 3D content, in which process we propose to use a hybrid diffusion
prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and
our trained MVControl for additional guidance. Extensive experiments
demonstrate that our method achieves robust generalization and enables the
controllable generation of high-quality 3D content. Code available at
https://github.com/WU-CVGL/MVControl/.
Related papers
- BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations [82.94002870060045]
Existing video generation models struggle to follow complex text prompts and synthesize multiple objects.
We develop a blob-grounded video diffusion model named BlobGEN-Vid that allows users to control object motions and fine-grained object appearance.
We show that our framework is model-agnostic and build BlobGEN-Vid based on both U-Net and DiT-based video diffusion models.
arXiv Detail & Related papers (2025-01-13T19:17:06Z) - MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model [87.71060849866093]
We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks.
Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses.
We present several training and model modifications to strengthen the model with scaled-up datasets.
arXiv Detail & Related papers (2024-11-25T07:34:23Z) - EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation [73.80275802696815]
We propose a universal framework called EasyControl for video generation.
Our method enables users to control video generation with a single condition map.
Our model demonstrates powerful image retention ability, resulting in high FVD and IS in UCF101 and MSR-VTT.
arXiv Detail & Related papers (2024-08-23T11:48:29Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes.
Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z) - Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models.
MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation.
In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model [30.44339780026541]
Zero123++ is an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view.
We develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models.
arXiv Detail & Related papers (2023-10-23T17:18:59Z) - Towards a Neural Graphics Pipeline for Controllable Image Generation [96.11791992084551]
We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models.
NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation.
We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
arXiv Detail & Related papers (2020-06-18T14:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.