Related papers: ArtLLM: Generating Articulated Assets via 3D LLM

ArtLLM: Generating Articulated Assets via 3D LLM

URL: http://arxiv.org/abs/2603.01142v1
Date: Sun, 01 Mar 2026 15:07:46 GMT
Title: ArtLLM: Generating Articulated Assets via 3D LLM
Authors: Penghao Wang, Siyuan Xie, Hongyu Yan, Xianghui Yang, Jingwei Huang, Chunchao Guo, Jiayuan Gu,
Abstract summary: ArtLLM is a novel framework for generating high-quality articulated assets directly from complete 3D meshes.<n>At its core is a 3D multimodal large language model trained on a large-scale articulation dataset.<n> Experiments show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction.
Score: 19.814132638278547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating interactive digital environments for gaming, robotics, and simulation relies on articulated 3D objects whose functionality emerges from their part geometry and kinematic structure. However, existing approaches remain fundamentally limited: optimization-based reconstruction methods require slow, per-object joint fitting and typically handle only simple, single-joint objects, while retrieval-based methods assemble parts from a fixed library, leading to repetitive geometry and poor generalization. To address these challenges, we introduce ArtLLM, a novel framework for generating high-quality articulated assets directly from complete 3D meshes. At its core is a 3D multimodal large language model trained on a large-scale articulation dataset curated from both existing articulation datasets and procedurally generated objects. Unlike prior work, ArtLLM autoregressively predicts a variable number of parts and joints, inferring their kinematic structure in a unified manner from the object's point cloud. This articulation-aware layout then conditions a 3D generative model to synthesize high-fidelity part geometries. Experiments on the PartNet-Mobility dataset show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction, while generalizing robustly to real-world objects. Finally, we demonstrate its utility in constructing digital twins, highlighting its potential for scalable robot learning.

Related papers

PAct: Part-Decomposed Single-View Articulated Object Generation [45.04652409374895]
Articulated objects are central to interactive 3D applications, including embodied AI, robotics, and VR/AR.<n>We introduce a part-centric generative framework for articulated object creation that synthesizes part geometry, composition, and articulation under explicit part-aware conditioning.<n>Our representation models an object as a set of movable parts, each encoded by latent tokens augmented with part identity and articulation cues.
arXiv Detail & Related papers (2026-02-16T17:45:44Z)
Particulate: Feed-Forward 3D Object Articulation [89.78788418174946]
Particulate is a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure.<n>We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets.<n>During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds.
arXiv Detail & Related papers (2025-12-12T18:59:51Z)
UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation [14.687459506970301]
UniArt is a diffusion-based framework that synthesizes fully articulated 3D objects from a single image in an end-to-end manner.<n>We introduce a reversible joint-to-voxel embedding, which spatially aligns articulation features with volumetric geometry.<n>Experiments on the PartNet-Mobility benchmark demonstrate that UniArt achieves state-of-the-art mesh quality and articulation accuracy.
arXiv Detail & Related papers (2025-11-26T20:09:11Z)
LARM: A Large Articulated-Object Reconstruction Model [29.66486888001511]
LARM is a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images.<n>LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation.<n>Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories.
arXiv Detail & Related papers (2025-11-14T18:55:27Z)
Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations [114.57192386025373]
Object-X is a versatile multi-modal 3D representation framework.<n>It can encoding rich object embeddings and decoding them back into geometric and visual reconstructions.<n>It supports a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization.
arXiv Detail & Related papers (2025-06-05T09:14:42Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling [48.78204955169967]
Articulate Anymesh is an automated framework that is able to convert rigid 3D mesh into its articulated counterpart in an open-vocabulary manner.<n>Our experiments show that Articulate Anymesh can generate large-scale, high-quality 3D articulated objects, including tools, toys, mechanical devices, and vehicles.
arXiv Detail & Related papers (2025-02-04T18:59:55Z)
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.