Related papers: DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

URL: http://arxiv.org/abs/2509.07435v1
Date: Tue, 09 Sep 2025 06:43:15 GMT
Title: DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
Authors: Ze-Xin Yin, Jiaxiong Qiu, Liu Liu, Xinjie Wang, Wei Sui, Zhizhong Su, Jian Yang, Jin Xie,
Abstract summary: Lightweight Gaussian Asset Adapter (LGAA) is a novel framework that unifies the modeling of geometry and PBR materials.<n>Our code, pre-trained weights, and the dataset used will be publicly available via our project page.
Score: 28.051782483658396
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The labor- and experience-intensive creation of 3D assets with physically based rendering (PBR) materials demands an autonomous 3D asset creation pipeline. However, most existing 3D generation methods focus on geometry modeling, either baking textures into simple vertex colors or leaving texture synthesis to post-processing with image diffusion models. To achieve end-to-end PBR-ready 3D asset generation, we present Lightweight Gaussian Asset Adapter (LGAA), a novel framework that unifies the modeling of geometry and PBR materials by exploiting multi-view (MV) diffusion priors from a novel perspective. The LGAA features a modular design with three components. Specifically, the LGAA Wrapper reuses and adapts network layers from MV diffusion models, which encapsulate knowledge acquired from billions of images, enabling better convergence in a data-efficient manner. To incorporate multiple diffusion priors for geometry and PBR synthesis, the LGAA Switcher aligns multiple LGAA Wrapper layers encapsulating different knowledge. Then, a tamed variational autoencoder (VAE), termed LGAA Decoder, is designed to predict 2D Gaussian Splatting (2DGS) with PBR channels. Finally, we introduce a dedicated post-processing procedure to effectively extract high-quality, relightable mesh assets from the resulting 2DGS. Extensive quantitative and qualitative experiments demonstrate the superior performance of LGAA with both text-and image-conditioned MV diffusion models. Additionally, the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme leads to efficient convergence trained on merely 69k multi-view instances. Our code, pre-trained weights, and the dataset used will be publicly available via our project page: https://zx-yin.github.io/dreamlifting/.

Related papers

Large Material Gaussian Model for Relightable 3D Generation [54.10879517395551]
We introduce a novel framework designed to generate high-quality 3D content with Physically Based Rendering (PBR) materials.<n>Our method not only exhibit greater visual appeal compared to baseline methods but also enhance material modeling, thereby enabling practical downstream rendering applications.
arXiv Detail & Related papers (2025-09-26T09:35:12Z)
MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation [30.69364954074992]
UNet-based diffusion models to generate multi-view physically rendering PBR maps but struggle with multi-view inconsistency, some 3D methods directly generate UV maps, issues due to the 3D data.<n>In the stage, we propose to generate PBR materials, where both the specially designed Transformer DiDi) model to generate PBR materials feature reference views.
arXiv Detail & Related papers (2024-12-18T18:45:35Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
Structured 3D Latents for Scalable and Versatile 3D Generation [28.672494137267837]
We introduce a novel 3D generation method for versatile and high-quality 3D asset creation.<n>The cornerstone is a unified Structured LATent representation which allows decoding to different output formats.<n>This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model.
arXiv Detail & Related papers (2024-12-02T13:58:38Z)
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model [87.71060849866093]
We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks.<n>Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses.<n>We present several training and model modifications to strengthen the model with scaled-up datasets.
arXiv Detail & Related papers (2024-11-25T07:34:23Z)
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.<n>With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.<n>Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z)
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts. We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z)
Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion [101.15628083270224]
We propose a novel multi-view conditioned diffusion model to synthesize high-fidelity novel view images.<n>We then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results.<n>Experiments show Magic-Boost greatly enhances the coarse generated inputs, generates high-quality 3D assets with rich geometric and textural details.
arXiv Detail & Related papers (2024-04-09T16:20:03Z)
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models.<n>MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation.<n>In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z)
Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z)
3DGen: Triplane Latent Diffusion for Textured Mesh Generation [17.178939191534994]
A triplane VAE learns latent representations of textured meshes and a conditional diffusion model generates the triplane features. For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes. It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation.
arXiv Detail & Related papers (2023-03-09T16:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.