ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States
- URL: http://arxiv.org/abs/2512.12395v1
- Date: Sat, 13 Dec 2025 17:00:03 GMT
- Title: ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States
- Authors: Haowen Wang, Xiaoping Yuan, Fugang Zhang, Rui Jian, Yuanwei Zhu, Xiuquan Qiao, Yakun Huang,
- Abstract summary: ArtGen is a conditional diffusion-based framework capable of generating articulated 3D objects with accurate geometry and coherent kinematics.<n>Specifically, ArtGen employs cross-state Monte Carlo sampling to explicitly enforce global kinematic consistency.<n>A compositional 3D-VAE latent prior enhanced with local-global attention effectively captures fine-grained geometry and global part-level relationships.
- Score: 9.721009445297716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating articulated assets is crucial for robotics, digital twins, and embodied intelligence. Existing generative models often rely on single-view inputs representing closed states, resulting in ambiguous or unrealistic kinematic structures due to the entanglement between geometric shape and joint dynamics. To address these challenges, we introduce ArtGen, a conditional diffusion-based framework capable of generating articulated 3D objects with accurate geometry and coherent kinematics from single-view images or text descriptions at arbitrary part-level states. Specifically, ArtGen employs cross-state Monte Carlo sampling to explicitly enforce global kinematic consistency, reducing structural-motion entanglement. Additionally, we integrate a Chain-of-Thought reasoning module to infer robust structural priors, such as part semantics, joint types, and connectivity, guiding a sparse-expert Diffusion Transformer to specialize in diverse kinematic interactions. Furthermore, a compositional 3D-VAE latent prior enhanced with local-global attention effectively captures fine-grained geometry and global part-level relationships. Extensive experiments on the PartNet-Mobility benchmark demonstrate that ArtGen significantly outperforms state-of-the-art methods.
Related papers
- ArtLLM: Generating Articulated Assets via 3D LLM [19.814132638278547]
ArtLLM is a novel framework for generating high-quality articulated assets directly from complete 3D meshes.<n>At its core is a 3D multimodal large language model trained on a large-scale articulation dataset.<n> Experiments show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction.
arXiv Detail & Related papers (2026-03-01T15:07:46Z) - GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding [14.436063587920005]
We introduce GeoDiT, the first diffusion-based vision-language model tailored for the geospatial domain.<n>It achieves significant gains in image captioning, visual grounding, and multi-object detection.<n>Our work validates that aligning the generative process with the data's intrinsic structure is key to unlocking superior performance in complex geospatial analysis.
arXiv Detail & Related papers (2025-12-02T07:59:46Z) - UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation [14.687459506970301]
UniArt is a diffusion-based framework that synthesizes fully articulated 3D objects from a single image in an end-to-end manner.<n>We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry.<n>Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.
arXiv Detail & Related papers (2025-11-26T20:09:11Z) - Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z) - ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents [31.495577251319315]
ArtiLatent is a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance.
arXiv Detail & Related papers (2025-10-24T13:08:15Z) - Hierarchical Neural Semantic Representation for 3D Semantic Correspondence [72.8101601086805]
We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
arXiv Detail & Related papers (2025-09-22T07:23:07Z) - GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects [4.717906057951389]
We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians.<n>This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts.<n>We show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types.
arXiv Detail & Related papers (2025-08-20T17:59:08Z) - HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics [60.737929335600015]
We present textbfHumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents.<n>HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization.
arXiv Detail & Related papers (2025-08-13T14:50:19Z) - Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation [23.18517560629462]
We introduce DeGSS, a unified framework that encodes articulated objects as deformable 3D Gaussian fields, embedding geometry, appearance, and motion in one compact representation.<n>To evaluate generalization and realism, we enlarge the synthetic PartNet-Mobility benchmark and release RS-Art, a real-to-sim dataset that pairs RGB captures with accurately reverse-engineered 3D models.
arXiv Detail & Related papers (2025-06-11T12:32:16Z) - DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data [67.99373622902827]
DIPO is a framework for controllable generation of articulated 3D objects from a pair of images.<n>We propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters.<n>We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions.
arXiv Detail & Related papers (2025-05-26T18:55:14Z) - ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z) - REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints.
Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.