Fugu-MT 論文翻訳(概要): Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

論文の概要: Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

arxiv url: http://arxiv.org/abs/2603.11606v1
Date: Thu, 12 Mar 2026 06:59:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.937449
Title: Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints
Title（参考訳）: Articulat3D:幾何学的・運動的制約による単眼映像からのArticulated Digital Twinsの再構成
Authors: Lijun Guo, Haoyu Zhao, Xingyue Zhao, Rong Fu, Linghao Zhuang, Siteng Huang, Zhongyu Li, Hua Zou,
Abstract要約: Articulat3Dは、カジュアルにキャプチャされたモノクロビデオからデジタルツインを構築する新しいフレームワークである。まず,3次元の軌跡を利用して調音運動の低次元構造を利用する動き優先初期化を提案する。次に、物理的に可算な調音を強制する幾何学的・運動的制約(Geometric and Motion Constraints Refinement)を導入する。実験により、Articulat3Dは、合成ベンチマークと現実世界のカジュアルにキャプチャされたモノクロビデオで最先端のパフォーマンスを達成することが示された。
参考スコア（独自算出の注目度）: 21.83046776294786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building high-fidelity digital twins of articulated objects from visual data remains a central challenge. Existing approaches depend on multi-view captures of the object in discrete, static states, which severely constrains their real-world scalability. In this paper, we introduce Articulat3D, a novel framework that constructs such digital twins from casually captured monocular videos by jointly enforcing explicit 3D geometric and motion constraints. We first propose Motion Prior-Driven Initialization, which leverages 3D point tracks to exploit the low-dimensional structure of articulated motion. By modeling scene dynamics with a compact set of motion bases, we facilitate soft decomposition of the scene into multiple rigidly-moving groups. Building on this initialization, we introduce Geometric and Motion Constraints Refinement, which enforces physically plausible articulation through learnable kinematic primitives parameterized by a joint axis, a pivot point, and per-frame motion scalars, yielding reconstructions that are both geometrically accurate and temporally coherent. Extensive experiments demonstrate that Articulat3D achieves state-of-the-art performance on synthetic benchmarks and real-world casually captured monocular videos, significantly advancing the feasibility of digital twin creation under uncontrolled real-world conditions. Our project page is at https://maxwell-zhao.github.io/Articulat3D.
Abstract（参考訳）: 視覚データから調音されたオブジェクトの高忠実なデジタルツインを構築することは、依然として中心的な課題である。既存のアプローチは、オブジェクトの離散的な静的な状態におけるマルチビューキャプチャに依存しており、現実のスケーラビリティを著しく制限している。本稿では,これらのデジタルツインをモノクロビデオから構築する新しいフレームワークであるArticulat3Dを紹介する。まず,3次元の軌跡を利用して調音運動の低次元構造を利用する動き優先初期化を提案する。コンパクトなモーションベースでシーンダイナミクスをモデル化することにより、シーンのソフトな分解を複数の厳密な移動群に容易に行うことができる。この初期化に基づいて, 幾何学的・運動的制約(Geometric and Motion Constraints Refinement)を導入する。これは, 関節軸, ピボット点, フレームごとの運動スカラーによってパラメータ化される学習可能な運動プリミティブを通じて, 物理的に可塑性な調音を強制し, 幾何的精度と時間的コヒーレントな再構成をもたらす。大規模な実験により、Articulat3Dは、合成ベンチマークと実世界のカジュアルなモノクロビデオで最先端のパフォーマンスを達成し、制御されていない実世界の条件下でのデジタルツイン生成の可能性を大幅に向上させた。私たちのプロジェクトページはhttps://maxwell-zhao.github.io/Articulat3D.comです。

論文の概要: Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

関連論文リスト