Related papers: KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation

KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation

URL: http://arxiv.org/abs/2510.17137v1
Date: Mon, 20 Oct 2025 04:15:40 GMT
Title: KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation
Authors: WenBo Xu, Liu Liu, Li Zhang, Ran Zhang, Hao Wu, Dan Guo, Meng Wang,
Abstract summary: Articulated objects, such as laptops and drawers, exhibit significant challenges for 3D reconstruction and pose estimation.<n>We propose KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation.
Score: 28.822034731724013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Articulated objects, such as laptops and drawers, exhibit significant challenges for 3D reconstruction and pose estimation due to their multi-part geometries and variable joint configurations, which introduce structural diversity across different states. To address these challenges, we propose KineDiff3D: Kinematic-Aware Diffusion for Category-Level Articulated Object Shape Reconstruction and Generation, a unified framework for reconstructing diverse articulated instances and pose estimation from single view input. Specifically, we first encode complete geometry (SDFs), joint angles, and part segmentation into a structured latent space via a novel Kinematic-Aware VAE (KA-VAE). In addition, we employ two conditional diffusion models: one for regressing global pose (SE(3)) and joint parameters, and another for generating the kinematic-aware latent code from partial observations. Finally, we produce an iterative optimization module that bidirectionally refines reconstruction accuracy and kinematic parameters via Chamfer-distance minimization while preserving articulation constraints. Experimental results on synthetic, semi-synthetic, and real-world datasets demonstrate the effectiveness of our approach in accurately reconstructing articulated objects and estimating their kinematic properties.

Related papers

ART: Articulated Reconstruction Transformer [22.27508161142687]
We introduce ART, a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images.<n>Our newly designed transformer architecture maps sparse image inputs to a set of learnable part slots, from which ART jointly decodes unified representations for individual parts.
arXiv Detail & Related papers (2025-12-16T18:35:23Z)
LARM: A Large Articulated-Object Reconstruction Model [29.66486888001511]
LARM is a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images.<n>LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation.<n>Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories.
arXiv Detail & Related papers (2025-11-14T18:55:27Z)
Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation [23.18517560629462]
We introduce DeGSS, a unified framework that encodes articulated objects as deformable 3D Gaussian fields, embedding geometry, appearance, and motion in one compact representation.<n>To evaluate generalization and realism, we enlarge the synthetic PartNet-Mobility benchmark and release RS-Art, a real-to-sim dataset that pairs RGB captures with accurately reverse-engineered 3D models.
arXiv Detail & Related papers (2025-06-11T12:32:16Z)
GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity [49.31257173003408]
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video.<n>Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
arXiv Detail & Related papers (2025-05-17T08:46:29Z)
Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image.<n>We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts.<n>Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z)
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild [22.82439286651921]
We propose a unified regression model that integrates segmentation and reconstruction, specifically designed for 3D shape reconstruction.<n>We also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds.<n>Our training on our synthetic data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images.
arXiv Detail & Related papers (2024-03-21T16:40:10Z)
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field [29.42222066097076]
Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging. We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
arXiv Detail & Related papers (2023-08-04T10:35:40Z)
Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances. We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.