ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals
- URL: http://arxiv.org/abs/2602.22666v1
- Date: Thu, 26 Feb 2026 06:35:23 GMT
- Title: ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals
- Authors: Xuelu Li, Zhaonan Wang, Xiaogang Wang, Lei Wu, Manyi Li, Changhe Tu,
- Abstract summary: ArtPro is a novel self-supervised framework that introduces adaptive integration of proposals.<n>We show that ArtPro achieves robust reconstruction of complex multi-part objects, significantly outperforming existing methods in accuracy and stability.
- Score: 18.50624014637526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing articulated objects into high-fidelity digital twins is crucial for applications such as robotic manipulation and interactive simulation. Recent self-supervised methods using differentiable rendering frameworks like 3D Gaussian Splatting remain highly sensitive to the initial part segmentation. Their reliance on heuristic clustering or pre-trained models often causes optimization to converge to local minima, especially for complex multi-part objects. To address these limitations, we propose ArtPro, a novel self-supervised framework that introduces adaptive integration of mobility proposals. Our approach begins with an over-segmentation initialization guided by geometry features and motion priors, generating part proposals with plausible motion hypotheses. During optimization, we dynamically merge these proposals by analyzing motion consistency among spatial neighbors, while a collision-aware motion pruning mechanism prevents erroneous kinematic estimation. Extensive experiments on both synthetic and real-world objects demonstrate that ArtPro achieves robust reconstruction of complex multi-part objects, significantly outperforming existing methods in accuracy and stability.
Related papers
- GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z) - Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization [27.083888910311984]
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks.<n>Existing methods struggle in cluttered environments.<n>We propose a unified optimization-based formulation for real-to-sim scene estimation.
arXiv Detail & Related papers (2026-02-23T18:58:24Z) - AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation [45.753757870577196]
We introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning.<n>We show that AGILE outperforms baselines in global geometric accuracy while demonstrating exceptional robustness on challenging sequences where prior art frequently collapses.
arXiv Detail & Related papers (2026-02-04T15:42:58Z) - Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z) - Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance [61.41904916189093]
We propose a novel diffusion-based framework for reconstructing 3D geometry of hand-held objects from monocular RGB images.<n>We use hand-object interaction as geometric guidance to ensure plausible hand-object interactions.
arXiv Detail & Related papers (2025-08-25T17:11:53Z) - GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects [4.717906057951389]
We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians.<n>This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts.<n>We show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types.
arXiv Detail & Related papers (2025-08-20T17:59:08Z) - Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z) - Efficient Design of Compliant Mechanisms Using Multi-Objective Optimization [50.24983453990065]
We address the synthesis of a compliant cross-hinge mechanism capable of large angular strokes.<n>We formulate a multi-objective optimization problem based on kinetostatic performance measures.
arXiv Detail & Related papers (2025-04-23T06:29:10Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z) - RobustFusion: Robust Volumetric Performance Reconstruction under
Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios.
Recent advances still fail to provide reliable performance reconstruction.
We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.