UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents
- URL: http://arxiv.org/abs/2512.09435v1
- Date: Wed, 10 Dec 2025 09:04:12 GMT
- Title: UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents
- Authors: Xufan He, Yushuang Wu, Xiaoyang Guo, Chongjie Ye, Jiaqing Zhou, Tianlei Hu, Xiaoguang Han, Dong Du,
- Abstract summary: Part-level 3D generation is essential for applications requiring decomposable and structured 3D synthesis.<n>Existing methods either rely on implicit part segmentation with limited granularity control or depend on strong external segmenters trained on large annotated datasets.<n>We introduce UniPart, a two-stage latent diffusion framework for image-guided part-level 3D generation.
- Score: 21.86068927019046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Part-level 3D generation is essential for applications requiring decomposable and structured 3D synthesis. However, existing methods either rely on implicit part segmentation with limited granularity control or depend on strong external segmenters trained on large annotated datasets. In this work, we observe that part awareness emerges naturally during whole-object geometry learning and propose Geom-Seg VecSet, a unified geometry-segmentation latent representation that jointly encodes object geometry and part-level structure. Building on this representation, we introduce UniPart, a two-stage latent diffusion framework for image-guided part-level 3D generation. The first stage performs joint geometry generation and latent part segmentation, while the second stage conditions part-level diffusion on both whole-object and part-specific latents. A dual-space generation scheme further enhances geometric fidelity by predicting part latents in both global and canonical spaces. Extensive experiments demonstrate that UniPart achieves superior segmentation controllability and part-level geometric quality compared with existing approaches.
Related papers
- Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - SegGraph: Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation [20.143843470532946]
This work presents a novel framework for few-shot 3D part segmentation.<n>Existing methods either ignore geometric structures for 3D feature learning or neglects the high-quality grouping clues from SAM.<n>We devise a novel SAM segment graph-based propagation method, named SegGraph, to explicitly learn geometric features encoded within SAM's segmentation masks.
arXiv Detail & Related papers (2025-12-18T03:55:17Z) - PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding [65.55036443711528]
PartNeXt is a next-generation dataset with over 23,000 high-quality, textured 3D models annotated with fine-grained, hierarchical part labels across 50 categories.<n>We benchmark PartNeXt on two tasks: (1) class-agnostic part segmentation, where state-of-the-art methods struggle with fine-grained and leaf-level parts, and (2) 3D part-centric question answering, a new benchmark for 3D-LLMs that reveals significant gaps in open-vocabulary part grounding.
arXiv Detail & Related papers (2025-10-23T03:06:08Z) - Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z) - Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z) - From One to More: Contextual Part Latents for 3D Generation [38.36190651170286]
CoPart is a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation.<n>We construct a novel 3D part dataset derived from articulated mesh segmentation and human-verified annotations.<n>Experiments demonstrate CoPart's superior capabilities in part-level editing, object generation, and scene composition with unprecedented controllability.
arXiv Detail & Related papers (2025-07-11T17:33:18Z) - HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation [79.46526296655776]
PRISM is a novel approach for 3D shape generation that integrates categorical diffusion models with Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM)<n>Our method employs compositional SSMs to capture part-level geometric variations and uses GMM to represent part semantics in a continuous space.<n>Our approach significantly outperforms previous methods in both quality and controllability of part-level operations.
arXiv Detail & Related papers (2025-04-06T11:48:08Z) - Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction [14.225228781008209]
This paper proposes a novel geometry integration mechanism for 3D scene reconstruction.
Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision.
arXiv Detail & Related papers (2024-08-28T08:02:47Z) - Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics [16.446659867133977]
PartGS is a self-supervised part-aware reconstruction framework that integrates 2D Gaussians and superquadrics to parse objects and scenes into an interpretable decomposition.<n>Our approach demonstrates superior performance compared to state-of-the-art methods across extensive experiments on the DTU, ShapeNet, and real-world datasets.
arXiv Detail & Related papers (2024-08-20T12:30:37Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.