ARMO: Autoregressive Rigging for Multi-Category Objects
- URL: http://arxiv.org/abs/2503.20663v1
- Date: Wed, 26 Mar 2025 15:56:48 GMT
- Title: ARMO: Autoregressive Rigging for Multi-Category Objects
- Authors: Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang,
- Abstract summary: We introduce OmniRig, the first large-scale rigging dataset, comprising 79,499 meshes with detailed skeleton and skinning information.<n>Unlike traditional benchmarks that rely on predefined standard poses, our dataset embraces diverse shape categories, styles, and poses.<n>We propose ARMO, a novel rigging framework that utilizes an autoregressive model to predict both joint positions and connectivity relationships in a unified manner.
- Score: 8.030479370619458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in large-scale generative models have significantly improved the quality and diversity of 3D shape generation. However, most existing methods focus primarily on generating static 3D models, overlooking the potentially dynamic nature of certain shapes, such as humanoids, animals, and insects. To address this gap, we focus on rigging, a fundamental task in animation that establishes skeletal structures and skinning for 3D models. In this paper, we introduce OmniRig, the first large-scale rigging dataset, comprising 79,499 meshes with detailed skeleton and skinning information. Unlike traditional benchmarks that rely on predefined standard poses (e.g., A-pose, T-pose), our dataset embraces diverse shape categories, styles, and poses. Leveraging this rich dataset, we propose ARMO, a novel rigging framework that utilizes an autoregressive model to predict both joint positions and connectivity relationships in a unified manner. By treating the skeletal structure as a complete graph and discretizing it into tokens, we encode the joints using an auto-encoder to obtain a latent embedding and an autoregressive model to predict the tokens. A mesh-conditioned latent diffusion model is used to predict the latent embedding for conditional skeleton generation. Our method addresses the limitations of regression-based approaches, which often suffer from error accumulation and suboptimal connectivity estimation. Through extensive experiments on the OmniRig dataset, our approach achieves state-of-the-art performance in skeleton prediction, demonstrating improved generalization across diverse object categories. The code and dataset will be made public for academic use upon acceptance.
Related papers
- MagicArticulate: Make Your 3D Models Articulation-Ready [109.35703811628045]
We present MagicArticulate, an effective framework that automatically transforms static 3D models into articulation-ready assets.<n>Our key contributions are threefold. First, we introduce Articulation-averse benchmark containing over 33k 3D models with high-quality articulation annotations, carefully curated from XL-XL.<n>Extensive experiments demonstrate that MagicArticulate significantly outperforms existing methods across diverse object categories.
arXiv Detail & Related papers (2025-02-17T18:53:27Z) - RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets [47.81216915952291]
We present RigAnything, a novel autoregressive transformer-based model.<n>It makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner.<n>RigAnything demonstrates state-of-the-art performance across diverse object types, including humanoids, quadrupeds, marine creatures, insects, and many more.
arXiv Detail & Related papers (2025-02-13T18:59:13Z) - HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset [6.978870586488504]
We present HumanRig, the first large-scale dataset specifically designed for 3D humanoid character rigging.<n>We introduce an innovative, data-driven automatic rigging framework, which overcomes the limitations of GNN-based methods.<n>This work not only remedies the dataset deficiency in rigging research but also propels the animation industry towards more efficient and automated character rigging pipelines.
arXiv Detail & Related papers (2024-12-03T09:33:00Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Domain Generalization via Ensemble Stacking for Face Presentation Attack
Detection [4.61143637299349]
Face Presentation Attack Detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks.
This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning.
Experimental results on four datasets demonstrate low half total error rates (HTERs) on three benchmark datasets.
arXiv Detail & Related papers (2023-01-05T16:44:36Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.