From One to More: Contextual Part Latents for 3D Generation
- URL: http://arxiv.org/abs/2507.08772v1
- Date: Fri, 11 Jul 2025 17:33:18 GMT
- Title: From One to More: Contextual Part Latents for 3D Generation
- Authors: Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu,
- Abstract summary: CoPart is a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation.<n>We construct a novel 3D part dataset derived from articulated mesh segmentation and human-verified annotations.<n>Experiments demonstrate CoPart's superior capabilities in part-level editing, object generation, and scene composition with unprecedented controllability.
- Score: 33.43336981984443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and interrelationships critical for compositional design; (3) Global conditioning mechanisms lack fine-grained controllability. Inspired by human 3D design workflows, we propose CoPart - a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation. This paradigm offers three advantages: i) Reduces encoding complexity through part decomposition; ii) Enables explicit part relationship modeling; iii) Supports part-level conditioning. We further develop a mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising, ensuring both geometric coherence and foundation model priors. To enable large-scale training, we construct Partverse - a novel 3D part dataset derived from Objaverse through automated mesh segmentation and human-verified annotations. Extensive experiments demonstrate CoPart's superior capabilities in part-level editing, articulated object generation, and scene composition with unprecedented controllability.
Related papers
- OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion [31.767548415448957]
We introduce OmniPart, a novel framework for part-aware 3D object generation.<n>Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications.
arXiv Detail & Related papers (2025-07-08T16:46:15Z) - PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers [29.52313100024294]
We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image.<n>PartCrafter simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes.<n> Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes.
arXiv Detail & Related papers (2025-06-05T20:30:28Z) - DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data [67.99373622902827]
DIPO is a framework for controllable generation of articulated 3D objects from a pair of images.<n>We propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters.<n>We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions.
arXiv Detail & Related papers (2025-05-26T18:55:14Z) - PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation [79.46526296655776]
PRISM is a novel approach for 3D shape generation that integrates categorical diffusion models with Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM)<n>Our method employs compositional SSMs to capture part-level geometric variations and uses GMM to represent part semantics in a continuous space.<n>Our approach significantly outperforms previous methods in both quality and controllability of part-level operations.
arXiv Detail & Related papers (2025-04-06T11:48:08Z) - DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting [24.719972380079405]
DecompDreamer is a training routine designed to generate high-quality 3D compositions.<n>It decomposes scenes into structured components and their relationships.<n>It effectively generates intricate 3D compositions with superior object disentanglement.
arXiv Detail & Related papers (2025-03-15T03:37:25Z) - DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.<n>Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.