ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
- URL: http://arxiv.org/abs/2502.19459v2
- Date: Wed, 19 Mar 2025 08:43:16 GMT
- Title: ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
- Authors: Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, Siyuan Huang,
- Abstract summary: Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
- Score: 66.29782808719301
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states, limiting the accuracy of part-mesh reconstruction and part dynamics modeling, particularly for complex multi-part articulated objects. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation to address these issues. Our method incorporates canonical Gaussians with coarse-to-fine initialization and updates for aligning articulated part information across different object states, and employs a skinning-inspired part dynamics modeling module to improve both part-mesh reconstruction and articulation learning. Extensive experiments on both synthetic and real-world datasets, including a new benchmark for complex multi-part objects, demonstrate that ArtGS achieves state-of-the-art performance in joint parameter estimation and part mesh reconstruction. Our approach significantly improves reconstruction quality and efficiency, especially for multi-part articulated objects. Additionally, we provide comprehensive analyses of our design choices, validating the effectiveness of each component to highlight potential areas for future improvement. Our work is made publicly available at: https://articulate-gs.github.io.
Related papers
- IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.
We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.
We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image.
We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts.
Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z) - Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics [31.819336585007104]
We propose to leverage superquadrics as an alternative 3D object representation to bounding boxes.<n>We demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks.<n>We also study the compositionality of actions by considering a more challenging task where the training combinations of verbs and nouns do not overlap with the testing split.
arXiv Detail & Related papers (2025-01-13T07:26:05Z) - SADG: Segment Any Dynamic Gaussian Without Object Trackers [39.77468734311312]
SADG, Segment Any Dynamic Gaussian Without Object Trackers, is a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.<n>We learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining.<n>We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes.
arXiv Detail & Related papers (2024-11-28T17:47:48Z) - Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [52.36691633451968]
ViTaM-D is a visual-tactile framework for dynamic hand-object interaction reconstruction.
DF-Field is a distributed force-aware contact representation model.
Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction.
arXiv Detail & Related papers (2024-11-14T16:29:45Z) - REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints.
Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z) - SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images
for Articulated Objects [24.737865259695006]
We propose a self-supervised interaction perception method, referred to as SM$3$, to model articulated objects.
By constructing 3D geometries and textures from the captured 2D images, SM$3$ achieves integrated optimization of movable part and joint parameters.
Evaluations demonstrate that SM$3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.
arXiv Detail & Related papers (2024-01-17T11:15:09Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.