Related papers: VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video

VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video

URL: http://arxiv.org/abs/2509.17647v1
Date: Mon, 22 Sep 2025 11:52:02 GMT
Title: VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video
Authors: Yu Liu, Baoxiong Jia, Ruijie Lu, Chuyue Gan, Huayu Chen, Junfeng Ni, Song-Chun Zhu, Siyuan Huang,
Abstract summary: Building digital twins of articulated objects from monocular video presents an essential challenge in computer vision.<n>We introduce VideoArtGS, a novel approach that reconstructs high-fidelity digital twins of articulated objects from monocular video.<n>VideoArtGS demonstrates state-of-the-art performance in articulation and mesh reconstruction, reducing the reconstruction error by about two orders of magnitude compared to existing methods.
Score: 60.63575135514847
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Building digital twins of articulated objects from monocular video presents an essential challenge in computer vision, which requires simultaneous reconstruction of object geometry, part segmentation, and articulation parameters from limited viewpoint inputs. Monocular video offers an attractive input format due to its simplicity and scalability; however, it's challenging to disentangle the object geometry and part dynamics with visual supervision alone, as the joint movement of the camera and parts leads to ill-posed estimation. While motion priors from pre-trained tracking models can alleviate the issue, how to effectively integrate them for articulation learning remains largely unexplored. To address this problem, we introduce VideoArtGS, a novel approach that reconstructs high-fidelity digital twins of articulated objects from monocular video. We propose a motion prior guidance pipeline that analyzes 3D tracks, filters noise, and provides reliable initialization of articulation parameters. We also design a hybrid center-grid part assignment module for articulation-based deformation fields that captures accurate part motion. VideoArtGS demonstrates state-of-the-art performance in articulation and mesh reconstruction, reducing the reconstruction error by about two orders of magnitude compared to existing methods. VideoArtGS enables practical digital twin creation from monocular video, establishing a new benchmark for video-based articulated object reconstruction. Our work is made publicly available at: https://videoartgs.github.io.

Related papers

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors [51.06020148149403]
We introduce ArtHOI, the first zero-shot framework for articulated human-object interaction synthesis via 4D reconstruction from video priors.<n>ArtHOI bridges video-based generation and geometry-aware reconstruction, producing interactions that are both semantically aligned and physically grounded.
arXiv Detail & Related papers (2026-03-04T17:58:04Z)
sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only [20.99905717289565]
We present the first data-driven approach that jointly predicts part segmentation and joint parameters from monocular video captured with a freely moving camera.<n>Our method demonstrates strong generalization to real-world objects, offering a scalable and practical solution for articulated object understanding.<n>Our approach operates directly on casually recorded video, making it suitable for real-time applications in dynamic environments.
arXiv Detail & Related papers (2025-12-08T16:38:30Z)
Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints [20.702086497025494]
In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals.<n>First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry.<n>Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry.
arXiv Detail & Related papers (2025-12-04T18:45:14Z)
SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video [6.408363851409316]
In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for physically based rendering.<n>We present a system that is able to perform both tasks for fabrics, utilizing only a single monocular RGB video sequence as input.<n>In comparison with the most recent methods in the field, we have reduced the error in the 3D reconstruction by a factor of 2.64 while requiring a medium runtime of 30 min per scene.
arXiv Detail & Related papers (2025-09-10T17:59:57Z)
Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry [41.904066758259624]
We introduce Vid-CamEdit, a novel framework for video camera trajectory editing.<n>Our approach consists of two steps: estimating temporally consistent geometry, and generative rendering guided by this geometry.
arXiv Detail & Related papers (2025-06-16T17:02:47Z)
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction [22.420752010237052]
We introduce ReVision, a plug-and-play framework that explicitly integrates parameterized 3D physical knowledge into a conditional video generation model.<n>We validate the effectiveness of our approach on Stable Video Diffusion, where ReVision significantly improves motion fidelity and coherence.<n>Our results suggest that, by incorporating 3D physical knowledge, even a relatively small video diffusion model can generate complex motions and interactions with greater realism and controllability.
arXiv Detail & Related papers (2025-04-30T17:59:56Z)
REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z)
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips [38.02945794078731]
We tackle the task of reconstructing hand-object interactions from short video clips. Our approach casts 3D inference as a per-video optimization and recovers a neural 3D representation of the object shape. We empirically evaluate our approach on egocentric videos, and observe significant improvements over prior single-view and multi-view methods.
arXiv Detail & Related papers (2023-09-11T17:58:30Z)
State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z)
NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z)
NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion. In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.