Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
- URL: http://arxiv.org/abs/2404.16829v3
- Date: Thu, 23 May 2024 19:12:51 GMT
- Title: Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
- Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin,
- Abstract summary: GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library.
The correctly matched materials are then meticulously applied as reference for the new SVBRDF material generation.
Make-it-Real offers a streamlined integration into the 3D content creation workflow.
- Score: 108.59709545364395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs), particularly GPT-4V, to present a novel approach, Make-it-Real: 1) We demonstrate that GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library. 2) Utilizing a combination of visual cues and hierarchical text prompts, GPT-4V precisely identifies and aligns materials with the corresponding components of 3D objects. 3) The correctly matched materials are then meticulously applied as reference for the new SVBRDF material generation according to the original albedo map, significantly enhancing their visual authenticity. Make-it-Real offers a streamlined integration into the 3D content creation workflow, showcasing its utility as an essential tool for developers of 3D assets.
Related papers
- CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets [43.315487682462845]
CLAY is a 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures.
At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT)
We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details.
arXiv Detail & Related papers (2024-05-30T05:57:36Z) - MaPa: Text-driven Photorealistic Material Painting for 3D Shapes [80.66880375862628]
This paper aims to generate materials for 3D meshes from text descriptions.
Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs.
Our framework supports high-quality rendering and provides substantial flexibility in editing.
arXiv Detail & Related papers (2024-04-26T17:54:38Z) - MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets [63.284244910964475]
We propose a 3D asset material generation framework to infer underlying material from the 2D semantic prior.
Based on such a prior model, we devise a mechanism to parse material in 3D space.
arXiv Detail & Related papers (2024-04-22T07:00:17Z) - HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a
Single Image [94.11473240505534]
We introduce HyperDreamer, a tool for creating 3D content from a single image.
It is hyper-realistic enough for post-generation usage, as users cannot view, render and edit the resulting 3D content from a full range.
We demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.
arXiv Detail & Related papers (2023-12-07T18:58:09Z) - MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR [29.96046140529936]
We propose Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR (textbfMATLABER)
We train this auto-encoder with large-scale real-world BRDF collections and ensure the smoothness of its latent space.
Our approach demonstrates the superiority over existing ones in generating realistic and coherent object materials.
arXiv Detail & Related papers (2023-08-18T03:40:38Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.