Topology Sculptor, Shape Refiner: Discrete Diffusion Model for High-Fidelity 3D Meshes Generation
- URL: http://arxiv.org/abs/2510.21264v2
- Date: Mon, 27 Oct 2025 16:38:35 GMT
- Title: Topology Sculptor, Shape Refiner: Discrete Diffusion Model for High-Fidelity 3D Meshes Generation
- Authors: Kaiyu Song, Hanjiang Lai, Yaqing Zhang, Chuangjian Cai, Yan Pan Kun Yue, Jian Yin,
- Abstract summary: Topology Sculptor, Shape Refiner (TSSR) is a novel method for generating high-quality, artist-style 3D meshes.<n>We leverage this parallel generation capability through three key innovations.<n> experiments on complex datasets demonstrate that TSSR generates high-quality 3D artist-style meshes.
- Score: 14.55646181682844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Topology Sculptor, Shape Refiner (TSSR), a novel method for generating high-quality, artist-style 3D meshes based on Discrete Diffusion Models (DDMs). Our primary motivation for TSSR is to achieve highly accurate token prediction while enabling parallel generation, a significant advantage over sequential autoregressive methods. By allowing TSSR to "see" all mesh tokens concurrently, we unlock a new level of efficiency and control. We leverage this parallel generation capability through three key innovations: 1) Decoupled Training and Hybrid Inference, which distinctly separates the DDM-based generation into a topology sculpting stage and a subsequent shape refinement stage. This strategic decoupling enables TSSR to effectively capture both intricate local topology and overarching global shape. 2) An Improved Hourglass Architecture, featuring bidirectional attention enriched by face-vertex-sequence level Rotational Positional Embeddings (RoPE), thereby capturing richer contextual information across the mesh structure. 3) A novel Connection Loss, which acts as a topological constraint to further enhance the realism and fidelity of the generated meshes. Extensive experiments on complex datasets demonstrate that TSSR generates high-quality 3D artist-style meshes, capable of achieving up to 10,000 faces at a remarkable spatial resolution of $1024^3$. The code will be released at: https://github.com/psky1111/Tencent-TSSR.
Related papers
- FACE: A Face-based Autoregressive Representation for High-Fidelity and Efficient Mesh Generation [50.71369329585773]
We introduce FACE, a novel Autoregressive Autoencoder framework that generates meshes at the face level.<n>Our one-face-one-token strategy treats each triangle face, the fundamental building block of a mesh, as a single, unified token.<n> FACE achieves state-of-the-art reconstruction quality on standard benchmarks.
arXiv Detail & Related papers (2026-03-02T06:47:15Z) - StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z) - Robust Mesh Saliency GT Acquisition in VR via View Cone Sampling and Geometric Smoothing [59.12032628787018]
3D mesh saliency ground truth is essential for human-centric visual modeling in virtual reality (VR)<n>Current VR eye-tracking pipelines rely on single ray sampling and Euclidean smoothing, triggering texture attention and signal leakage across gaps.<n>This paper proposes a robust framework to address these limitations.
arXiv Detail & Related papers (2026-01-06T05:20:12Z) - PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion [14.879669869466072]
PartDiffuser is a novel semi-autoregressive diffusion framework for point-cloud-to-mesh generation.<n>PartDiffuser is based on the DiT architecture and introduces a part-aware cross-attention mechanism.<n> Experiments demonstrate that this method significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail.
arXiv Detail & Related papers (2025-11-24T06:11:21Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering [50.675710727721786]
We propose GauSSmart, a hybrid method that bridges 2D foundational models and 3D Gaussian Splatting reconstruction.<n>Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision.<n>We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting.
arXiv Detail & Related papers (2025-10-16T03:38:26Z) - H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction [39.22287224290769]
H3R is a hybrid framework that integrates latent fusion with attention-based feature aggregation.<n>By integrating both paradigms, our approach enhances generalization while converging 2$times$ faster than existing methods.<n>Our method supports variable-number and high-resolution input views while demonstrating robust cross-dataset generalization.
arXiv Detail & Related papers (2025-08-05T05:56:30Z) - Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition [9.411542547451193]
We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
arXiv Detail & Related papers (2025-06-17T07:04:07Z) - Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z) - MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs [79.45006864728893]
MeshCraft is a framework for efficient and controllable mesh generation.<n>It uses continuous spatial diffusion to generate discrete triangle faces.<n>It can generate an 800-face mesh in just 3.2 seconds.
arXiv Detail & Related papers (2025-03-29T09:21:50Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.