Related papers: MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

URL: http://arxiv.org/abs/2311.14494v2
Date: Mon, 27 Nov 2023 12:39:42 GMT
Title: MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation
Authors: Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
Abstract summary: We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models. Our approach enables the generation of controllable multi-view images and view-consistent 3D content.
Score: 10.250715657201363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable multi-view images and view-consistent 3D content. To achieve controllable multi-view image generation, we leverage MVDream as our base model, and train a new neural network module as additional plugin for end-to-end task-specific condition learning. To precisely control the shapes and views of generated images, we innovatively propose a new conditioning mechanism that predicts an embedding encapsulating the input spatial and view conditions, which is then injected to the network globally. Once MVControl is trained, score-distillation (SDS) loss based optimization can be performed to generate 3D content, in which process we propose to use a hybrid diffusion prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and our trained MVControl for additional guidance. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. Code available at https://github.com/WU-CVGL/MVControl/.

Related papers

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations [82.94002870060045]
Existing video generation models struggle to follow complex text prompts and synthesize multiple objects. We develop a blob-grounded video diffusion model named BlobGEN-Vid that allows users to control object motions and fine-grained object appearance. We show that our framework is model-agnostic and build BlobGEN-Vid based on both U-Net and DiT-based video diffusion models.
arXiv Detail & Related papers (2025-01-13T19:17:06Z)
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model [87.71060849866093]
We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks. Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses. We present several training and model modifications to strengthen the model with scaled-up datasets.
arXiv Detail & Related papers (2024-11-25T07:34:23Z)
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation [73.80275802696815]
We propose a universal framework called EasyControl for video generation. Our method enables users to control video generation with a single condition map. Our model demonstrates powerful image retention ability, resulting in high FVD and IS in UCF101 and MSR-VTT.
arXiv Detail & Related papers (2024-08-23T11:48:29Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes. Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z)
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models. MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model [30.44339780026541]
Zero123++ is an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. We develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models.
arXiv Detail & Related papers (2023-10-23T17:18:59Z)
Towards a Neural Graphics Pipeline for Controllable Image Generation [96.11791992084551]
We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation. We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
arXiv Detail & Related papers (2020-06-18T14:22:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.