Related papers: Spherical Geometry Diffusion: Generating High-quality 3D Face Geometry via Sphere-anchored Representations

Spherical Geometry Diffusion: Generating High-quality 3D Face Geometry via Sphere-anchored Representations

URL: http://arxiv.org/abs/2601.13371v1
Date: Mon, 19 Jan 2026 20:15:45 GMT
Title: Spherical Geometry Diffusion: Generating High-quality 3D Face Geometry via Sphere-anchored Representations
Authors: Junyi Zhang, Yiming Wang, Yunhong Lu, Qichao Wang, Wenzhe Qian, Xiaoyin Xu, David Gu, Min Zhang,
Abstract summary: A fundamental challenge in text-to-3D face generation is achieving high-quality geometry.<n>We introduce the Spherical Geometry Representation, a novel face representation that anchors geometric signals to uniform spherical coordinates.<n>We then introduce Spherical Diffusion Geometry, a conditional diffusion framework built upon this 2D map.
Score: 18.442834011472005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental challenge in text-to-3D face generation is achieving high-quality geometry. The core difficulty lies in the arbitrary and intricate distribution of vertices in 3D space, making it challenging for existing models to establish clean connectivity and resulting in suboptimal geometry. To address this, our core insight is to simplify the underlying geometric structure by constraining the distribution onto a simple and regular manifold, a topological sphere. Building on this, we first propose the Spherical Geometry Representation, a novel face representation that anchors geometric signals to uniform spherical coordinates. This guarantees a regular point distribution, from which the mesh connectivity can be robustly reconstructed. Critically, this canonical sphere can be seamlessly unwrapped into a 2D map, creating a perfect synergy with powerful 2D generative models. We then introduce Spherical Geometry Diffusion, a conditional diffusion framework built upon this 2D map. It enables diverse and controllable generation by jointly modeling geometry and texture, where the geometry explicitly conditions the texture synthesis process. Our method's effectiveness is demonstrated through its success in a wide range of tasks: text-to-3D generation, face reconstruction, and text-based 3D editing. Extensive experiments show that our approach substantially outperforms existing methods in geometric quality, textual fidelity, and inference efficiency.

Related papers

GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation [68.02988074681427]
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content.<n>In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models.<n>Our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-11-28T13:55:45Z)
Generating Surface for Text-to-3D using 2D Gaussian Splatting [7.610379621632961]
We propose a novel method named DirectGaussian, which focuses on generating the surfaces of 3D objects represented by surfels.<n>In DirectGaussian, we utilize conditional text generation models and the surface of a 3D object is rendered by 2D Gaussian splatting.<n>Our framework is capable of achieving diverse and high-fidelity 3D content creation.
arXiv Detail & Related papers (2025-10-08T12:54:57Z)
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z)
Shape from Semantics: 3D Shape Generation from Multi-View Semantics [30.969299308083723]
Existing 3D reconstruction methods utilize guidances such as 2D images, 3D point clouds, shape contours and single semantics to recover the 3D surface.<n>We propose a novel 3D modeling task called Shape from Semantics'', which aims to create 3D models whose geometry and appearance are consistent with the given text semantics when viewed from different views.
arXiv Detail & Related papers (2025-02-01T07:51:59Z)
Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction [14.225228781008209]
This paper proposes a novel geometry integration mechanism for 3D scene reconstruction. Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision.
arXiv Detail & Related papers (2024-08-28T08:02:47Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape Generation [98.96086261213578]
We introduce DSG-Net, a deep neural network that learns a disentangled structured and geometric mesh representation for 3D shapes. This supports a range of novel shape generation applications with disentangled control, such as of structure (geometry) while keeping geometry (structure) unchanged. Our method not only supports controllable generation applications but also produces high-quality synthesized shapes, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-08-12T17:06:51Z)
Deep Geometric Texture Synthesis [83.9404865744028]
We propose a novel framework for synthesizing geometric textures. It learns texture statistics from local neighborhoods of a single reference 3D model. Our network displaces mesh vertices in any direction, enabling synthesis of geometric textures.
arXiv Detail & Related papers (2020-06-30T19:36:38Z)
PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling [103.09504572409449]
We propose a novel deep neural network based method, called PUGeo-Net, to generate uniform dense point clouds. Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details.
arXiv Detail & Related papers (2020-02-24T14:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.