Single Image to High-Quality 3D Object via Latent Features
- URL: http://arxiv.org/abs/2511.19512v1
- Date: Mon, 24 Nov 2025 02:31:04 GMT
- Title: Single Image to High-Quality 3D Object via Latent Features
- Authors: Huanning Dong, Yinuo Huang, Fan Li, Ping Kuang,
- Abstract summary: We introduce LatentDreamer, a framework for generating 3D objects from single images.<n>A pre-trained variational autoencoder maps 3D geometries to latent features.<n>LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially.
- Score: 7.610379621632961
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D assets are essential in the digital age. While automatic 3D generation, such as image-to-3d, has made significant strides in recent years, it often struggles to achieve fast, detailed, and high-fidelity generation simultaneously. In this work, we introduce LatentDreamer, a novel framework for generating 3D objects from single images. The key to our approach is a pre-trained variational autoencoder that maps 3D geometries to latent features, which greatly reducing the difficulty of 3D generation. Starting from latent features, the pipeline of LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially. The 3D objects generated by LatentDreamer exhibit high fidelity to the input images, and the entire generation process can be completed within a short time (typically in 70 seconds). Extensive experiments show that with only a small amount of training, LatentDreamer demonstrates competitive performance compared to contemporary approachs.
Related papers
- ShapeGen: Towards High-Quality 3D Shape Synthesis [67.37531089151034]
3D shape generation has made notable progress, enabling the rapid synthesis of high-fidelity 3D assets from a single image.<n>However, current methods still face challenges, including the lack of intricate details, overly smoothed surfaces, and fragmented thin-shell structures.<n>We present ShapeGen, which achieves high-quality image-to-3D shape generation through 3D representation and supervision improvements, resolution scaling up, and the advantages of linear transformers.
arXiv Detail & Related papers (2025-11-25T18:47:27Z) - GraphicsDreamer: Image to 3D Generation with Physical Consistency [32.26851174969898]
We introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images.<n>In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details.<n>Our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines.
arXiv Detail & Related papers (2024-12-18T10:01:27Z) - LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image [64.94932577552458]
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.
Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data.
We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
arXiv Detail & Related papers (2024-05-24T15:09:12Z) - Compress3D: a Compressed Latent Space for 3D Generation from a Single Image [27.53099431097921]
Triplane autoencoder encodes 3D models into a compact triplane latent space to compress both the 3D geometry and texture information.
We introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume.
Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU.
arXiv Detail & Related papers (2024-03-20T11:51:04Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion [32.29687304798145]
One-2-3-45++ is an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute.
Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data.
arXiv Detail & Related papers (2023-11-14T03:40:25Z) - IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts [90.49024750432139]
We present IPDreamer, a novel method that captures intricate appearance features from complex $textbfI$mage $textbfP$rompts and aligns the synthesized 3D object with these extracted features.
Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts.
arXiv Detail & Related papers (2023-10-09T03:11:08Z) - GET3D: A Generative Model of High Quality 3D Textured Shapes Learned
from Images [72.15855070133425]
We introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures.
GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings.
arXiv Detail & Related papers (2022-09-22T17:16:19Z) - Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent.
In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations.
We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.