CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation
- URL: http://arxiv.org/abs/2507.15606v1
- Date: Mon, 21 Jul 2025 13:28:59 GMT
- Title: CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation
- Authors: Ru Jia, Xiaozhuang Ma, Jianji Wang, Nanning Zheng,
- Abstract summary: CylinderPlane is a novel implicit representation based on Cylindrical Coordinate System.<n>Our representation is agnostic to implicit rendering methods and can be easily integrated into any neural rendering pipeline.
- Score: 30.123631491028352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the proposal of the Tri-plane representation has advanced the development of the 3D-aware image generative models, problems rooted in its inherent structure, such as multi-face artifacts caused by sharing the same features in symmetric regions, limit its ability to generate 360$^\circ$ view images. In this paper, we propose CylinderPlane, a novel implicit representation based on Cylindrical Coordinate System, to eliminate the feature ambiguity issue and ensure multi-view consistency in 360$^\circ$. Different from the inevitable feature entanglement in Cartesian coordinate-based Tri-plane representation, the cylindrical coordinate system explicitly separates features at different angles, allowing our cylindrical representation possible to achieve high-quality, artifacts-free 360$^\circ$ image synthesis. We further introduce the nested cylinder representation that composites multiple cylinders at different scales, thereby enabling the model more adaptable to complex geometry and varying resolutions. The combination of cylinders with different resolutions can effectively capture more critical locations and multi-scale features, greatly facilitates fine detail learning and robustness to different resolutions. Moreover, our representation is agnostic to implicit rendering methods and can be easily integrated into any neural rendering pipeline. Extensive experiments on both synthetic dataset and unstructured in-the-wild images demonstrate that our proposed representation achieves superior performance over previous methods.
Related papers
- QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction [49.75084732129701]
3D occupancy prediction is crucial for robust autonomous driving systems.<n>Most existing methods employ dense voxel-based scene representations.<n>We present QuadricFormer, a superquadric-based model for efficient 3D occupancy prediction.
arXiv Detail & Related papers (2025-06-12T17:59:45Z) - LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives.<n>We present a different octaiableizer that runs efficiently on GPU.<n>We demonstrate comparable performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T18:49:38Z) - F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets.<n>We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting.<n>We also introduce a self-supervised cycle-aggregative constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z) - StdGEN: Semantic-Decomposed 3D Character Generation from Single Images [28.302030751098354]
StdGEN is an innovative pipeline for generating semantically high-quality 3D characters from single images.<n>It generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes.<n>StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications.
arXiv Detail & Related papers (2024-11-08T17:54:18Z) - VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field [5.573454319150408]
We introduce a volumetric optimization framework that combines explicit SDF fields with a shallow color network, in order to estimate 3D shape properties over tetrahedral grids.
Experimental results with Chamfer statistics validate this approach with unprecedented reconstruction quality on various scenarios such as objects, open scenes or human.
arXiv Detail & Related papers (2024-07-29T09:46:39Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - DiFT: Differentiable Differential Feature Transform for Multi-View
Stereo [16.47413993267985]
We learn to transform the differential cues from a stack of images densely captured with a rotational motion into spatially discriminative and view-invariant per-pixel features at each view.
These low-level features can be directly fed to any existing multi-view stereo technique for enhanced 3D reconstruction.
arXiv Detail & Related papers (2022-03-16T07:12:46Z) - Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation.
We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z) - GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis [43.4859484191223]
We propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene.
By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone.
arXiv Detail & Related papers (2020-07-05T20:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.