CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction
Model
- URL: http://arxiv.org/abs/2403.05034v1
- Date: Fri, 8 Mar 2024 04:25:29 GMT
- Title: CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction
Model
- Authors: Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen,
Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu
- Abstract summary: We present a high-fidelity feed-forward single image-to-3D generative model.
We highlight the necessity of integrating geometric priors into network design.
Our model delivers a high-fidelity textured mesh from an image in just 10 seconds, without any test-time optimization.
- Score: 37.75256020559125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feed-forward 3D generative models like the Large Reconstruction Model (LRM)
have demonstrated exceptional generation speed. However, the transformer-based
methods do not leverage the geometric priors of the triplane component in their
architecture, often leading to sub-optimal quality given the limited size of 3D
data and slow training. In this work, we present the Convolutional
Reconstruction Model (CRM), a high-fidelity feed-forward single image-to-3D
generative model. Recognizing the limitations posed by sparse 3D data, we
highlight the necessity of integrating geometric priors into network design.
CRM builds on the key observation that the visualization of triplane exhibits
spatial correspondence of six orthographic images. First, it generates six
orthographic view images from a single input image, then feeds these images
into a convolutional U-Net, leveraging its strong pixel-level alignment
capabilities and significant bandwidth to create a high-resolution triplane.
CRM further employs Flexicubes as geometric representation, facilitating direct
end-to-end optimization on textured meshes. Overall, our model delivers a
high-fidelity textured mesh from an image in just 10 seconds, without any
test-time optimization.
Related papers
- From Flat to Spatial: Comparison of 4 methods constructing 3D, 2 and 1/2D Models from 2D Plans with neural networks [0.0]
The conversion of single images into 2 and 1/2D and 3D meshes is a promising technology that enhances design visualization and efficiency.
This paper evaluates four innovative methods: "One-2-3-45," " CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model," "Instant Mesh," and "Image-to-Mesh"
arXiv Detail & Related papers (2024-07-29T13:01:20Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement [51.97726804507328]
We propose a novel approach for 3D mesh reconstruction from multi-view images.
Our method takes inspiration from large reconstruction models that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.
arXiv Detail & Related papers (2024-06-09T05:19:24Z) - LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image [64.94932577552458]
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.
Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data.
We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
arXiv Detail & Related papers (2024-05-24T15:09:12Z) - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models [66.83681825842135]
InstantMesh is a feed-forward framework for instant 3D mesh generation from a single image.
It features state-of-the-art generation quality and significant training scalability.
We release all the code, weights, and demo of InstantMesh with the intention that it can make substantial contributions to the community of 3D generative AI.
arXiv Detail & Related papers (2024-04-10T17:48:37Z) - FlexiDreamer: Single Image-to-3D Generation with FlexiCubes [20.871847154995688]
FlexiDreamer is a novel framework that directly reconstructs high-quality meshes from multi-view generated images.
Our approach can generate high-fidelity 3D meshes in the single image-to-3D downstream task with approximately 1 minute.
arXiv Detail & Related papers (2024-04-01T08:20:18Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.