Related papers: GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

URL: http://arxiv.org/abs/2406.15333v1
Date: Fri, 21 Jun 2024 17:49:31 GMT
Title: GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation
Authors: Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang,
Abstract summary: We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
Score: 65.33726478659304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications.

Related papers

Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction [30.518107360632488]
Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation.<n>method provides an efficient, scalable solution for real-world 3D content generation.
arXiv Detail & Related papers (2025-07-20T11:33:13Z)
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model [38.13429047918231]
We propose a novel hybrid Voxel-Gaussian representation, where a 3D voxel representation contains explicit 3D geometric information. Our 3D voxel representation is obtained by a fusion module that aligns RGB features and surface normal features, both of which can be estimated from 2D images.
arXiv Detail & Related papers (2025-04-02T08:58:34Z)
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging [15.36983068580743]
Hi3DGen is a novel framework for generating high-fidelity 3D geometry from images via normal bridging. Our work provides a new direction for high-fidelity 3D geometry generation from images by leveraging normal maps as an intermediate representation.
arXiv Detail & Related papers (2025-03-28T08:39:20Z)
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets. We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting. We also introduce a self-supervised cycle-aggregative constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z)
GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained. We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z)
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image [64.94932577552458]
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
arXiv Detail & Related papers (2024-05-24T15:09:12Z)
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation [85.15374487533643]
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models.
arXiv Detail & Related papers (2024-03-21T17:59:34Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models. MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator [68.0533826852601]
3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes. Existing methods fail to obtain moderate 3D shapes. We propose a geometry-aware discriminator to improve 3D-aware GANs.
arXiv Detail & Related papers (2022-09-30T17:59:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.