Related papers: ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States

ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States

URL: http://arxiv.org/abs/2512.12395v1
Date: Sat, 13 Dec 2025 17:00:03 GMT
Title: ArtGen: Conditional Generative Modeling of Articulated Objects in Arbitrary Part-Level States
Authors: Haowen Wang, Xiaoping Yuan, Fugang Zhang, Rui Jian, Yuanwei Zhu, Xiuquan Qiao, Yakun Huang,
Abstract summary: ArtGen is a conditional diffusion-based framework capable of generating articulated 3D objects with accurate geometry and coherent kinematics.<n>Specifically, ArtGen employs cross-state Monte Carlo sampling to explicitly enforce global kinematic consistency.<n>A compositional 3D-VAE latent prior enhanced with local-global attention effectively captures fine-grained geometry and global part-level relationships.
Score: 9.721009445297716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating articulated assets is crucial for robotics, digital twins, and embodied intelligence. Existing generative models often rely on single-view inputs representing closed states, resulting in ambiguous or unrealistic kinematic structures due to the entanglement between geometric shape and joint dynamics. To address these challenges, we introduce ArtGen, a conditional diffusion-based framework capable of generating articulated 3D objects with accurate geometry and coherent kinematics from single-view images or text descriptions at arbitrary part-level states. Specifically, ArtGen employs cross-state Monte Carlo sampling to explicitly enforce global kinematic consistency, reducing structural-motion entanglement. Additionally, we integrate a Chain-of-Thought reasoning module to infer robust structural priors, such as part semantics, joint types, and connectivity, guiding a sparse-expert Diffusion Transformer to specialize in diverse kinematic interactions. Furthermore, a compositional 3D-VAE latent prior enhanced with local-global attention effectively captures fine-grained geometry and global part-level relationships. Extensive experiments on the PartNet-Mobility benchmark demonstrate that ArtGen significantly outperforms state-of-the-art methods.

Related papers

ArtLLM: Generating Articulated Assets via 3D LLM [19.814132638278547]
ArtLLM is a novel framework for generating high-quality articulated assets directly from complete 3D meshes.<n>At its core is a 3D multimodal large language model trained on a large-scale articulation dataset.<n> Experiments show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction.
arXiv Detail & Related papers (2026-03-01T15:07:46Z)
GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding [14.436063587920005]
We introduce GeoDiT, the first diffusion-based vision-language model tailored for the geospatial domain.<n>It achieves significant gains in image captioning, visual grounding, and multi-object detection.<n>Our work validates that aligning the generative process with the data's intrinsic structure is key to unlocking superior performance in complex geospatial analysis.
arXiv Detail & Related papers (2025-12-02T07:59:46Z)
UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation [14.687459506970301]
UniArt is a diffusion-based framework that synthesizes fully articulated 3D objects from a single image in an end-to-end manner.<n>We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry.<n>Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.
arXiv Detail & Related papers (2025-11-26T20:09:11Z)
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z)
ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents [31.495577251319315]
ArtiLatent is a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance.
arXiv Detail & Related papers (2025-10-24T13:08:15Z)
Hierarchical Neural Semantic Representation for 3D Semantic Correspondence [72.8101601086805]
We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
arXiv Detail & Related papers (2025-09-22T07:23:07Z)
GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects [4.717906057951389]
We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians.<n>This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts.<n>We show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types.
arXiv Detail & Related papers (2025-08-20T17:59:08Z)
HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics [60.737929335600015]
We present textbfHumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents.<n>HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization.
arXiv Detail & Related papers (2025-08-13T14:50:19Z)
Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation [23.18517560629462]
We introduce DeGSS, a unified framework that encodes articulated objects as deformable 3D Gaussian fields, embedding geometry, appearance, and motion in one compact representation.<n>To evaluate generalization and realism, we enlarge the synthetic PartNet-Mobility benchmark and release RS-Art, a real-to-sim dataset that pairs RGB captures with accurately reverse-engineered 3D models.
arXiv Detail & Related papers (2025-06-11T12:32:16Z)
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data [67.99373622902827]
DIPO is a framework for controllable generation of articulated 3D objects from a pair of images.<n>We propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters.<n>We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions.
arXiv Detail & Related papers (2025-05-26T18:55:14Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.