Related papers: From One to More: Contextual Part Latents for 3D Generation

From One to More: Contextual Part Latents for 3D Generation

URL: http://arxiv.org/abs/2507.08772v2
Date: Thu, 30 Oct 2025 04:25:25 GMT
Title: From One to More: Contextual Part Latents for 3D Generation
Authors: Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu,
Abstract summary: CoPart is a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation.<n>We construct a novel 3D part dataset derived from articulated mesh segmentation and human-verified annotations.<n>Experiments demonstrate CoPart's superior capabilities in part-level editing, object generation, and scene composition with unprecedented controllability.
Score: 38.36190651170286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and interrelationships critical for compositional design; (3) Global conditioning mechanisms lack fine-grained controllability. Inspired by human 3D design workflows, we propose CoPart - a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation. This paradigm offers three advantages: i) Reduces encoding complexity through part decomposition; ii) Enables explicit part relationship modeling; iii) Supports part-level conditioning. We further develop a mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising, ensuring both geometric coherence and foundation model priors. To enable large-scale training, we construct Partverse - a novel 3D part dataset derived from Objaverse through automated mesh segmentation and human-verified annotations. Extensive experiments demonstrate CoPart's superior capabilities in part-level editing, articulated object generation, and scene composition with unprecedented controllability.

Related papers

StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z)
UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents [21.86068927019046]
Part-level 3D generation is essential for applications requiring decomposable and structured 3D synthesis.<n>Existing methods either rely on implicit part segmentation with limited granularity control or depend on strong external segmenters trained on large annotated datasets.<n>We introduce UniPart, a two-stage latent diffusion framework for image-guided part-level 3D generation.
arXiv Detail & Related papers (2025-12-10T09:04:12Z)
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [28.308731720451053]
We propose ReconViaGen to integrate reconstruction priors into the generative framework.<n>Our experiments demonstrate that our ReconViaGen can reconstruct complete and accurate 3D models consistent with input views in both global structure and local details.
arXiv Detail & Related papers (2025-10-27T13:15:06Z)
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z)
HierOctFusion: Multi-scale Octree-based 3D Shape Generation via Part-Whole-Hierarchy Message Passing [9.953394373473621]
3D content generation remains a fundamental yet challenging task due to the inherent structural complexity of 3D data.<n>We propose HierOctFusion, a part-aware multi-scale octree diffusion model that enhances hierarchical feature interaction for generating fine-grained and sparse object structures.<n> Experiments demonstrate that HierOctFusion achieves superior shape quality and efficiency compared to prior methods.
arXiv Detail & Related papers (2025-08-14T23:12:18Z)
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion [31.767548415448957]
We introduce OmniPart, a novel framework for part-aware 3D object generation.<n>Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications.
arXiv Detail & Related papers (2025-07-08T16:46:15Z)
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers [29.52313100024294]
We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image.<n>PartCrafter simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes.<n> Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes.
arXiv Detail & Related papers (2025-06-05T20:30:28Z)
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data [67.99373622902827]
DIPO is a framework for controllable generation of articulated 3D objects from a pair of images.<n>We propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters.<n>We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions.
arXiv Detail & Related papers (2025-05-26T18:55:14Z)
PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation [79.46526296655776]
PRISM is a novel approach for 3D shape generation that integrates categorical diffusion models with Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM)<n>Our method employs compositional SSMs to capture part-level geometric variations and uses GMM to represent part semantics in a continuous space.<n>Our approach significantly outperforms previous methods in both quality and controllability of part-level operations.
arXiv Detail & Related papers (2025-04-06T11:48:08Z)
DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting [24.719972380079405]
DecompDreamer is a training routine designed to generate high-quality 3D compositions.<n>It decomposes scenes into structured components and their relationships.<n>It effectively generates intricate 3D compositions with superior object disentanglement.
arXiv Detail & Related papers (2025-03-15T03:37:25Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.<n>Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image. We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z)
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.