Fugu-MT 論文翻訳(概要): TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

論文の概要: TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

arxiv url: http://arxiv.org/abs/2606.12153v1
Date: Wed, 10 Jun 2026 14:41:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.510439
Title: TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation
Title（参考訳）: TopoCap: モノクロ映像アニメーションのためのトポロジ非依存動作の事前学習
Authors: Cheng-Feng Pu, Jia-Peng Zhang, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu,
Abstract要約: TopoCapはモノクロビデオから動きを抽出し、任意の骨格トポロジで文字に書き込むことができる最初の統合フレームワークである。我々の重要な洞察は、骨格構造は離散的であるが、運動の基盤となる物理学は連続的で低次元の多様体を占有しているということである。この洞察を、2段階の生成パイプラインを通じて実現します。
参考スコア（独自算出の注目度）: 44.79819257609757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The explosion of generative 3D assets has created a massive demand for animation, yet current motion capture methods remain brittle, restricted to species-specific templates (e.g., SMPL) or requiring labor-intensive manual rigging. We introduce TopoCap, the first unified framework capable of extracting motion from monocular video and retargeting it onto characters with arbitrary, unseen skeletal topologies, i.e., from bipeds to hexapods and inanimate objects, without test-time optimization. Our key insight is that while skeletal structures are combinatorial and discrete, the underlying physics of motion occupy a continuous, low-dimensional manifold. We materialize this insight via a two-stage generative pipeline. First, we learn a Universal Motion Manifold using a Graph CVAE that compresses heterogeneous kinematic chains into a shared, fixed-length latent code. By explicitly conditioning the decoder on a structural embedding of the target rig, we disentangle motion dynamics from skeletal topology. Second, we treat video-to-animation as a conditional flow matching problem, predicting these topology-agnostic codes from visual features. To learn this generalized prior, we introduce Mobjaverse, a massive-scale dataset curated from Objaverse-XL. Comprising over 5,000 unique skeletal topologies and 2 million frames, it exceeds the structural diversity of existing datasets by two orders of magnitude. Extensive experiments demonstrate that \MethodMotion outperforms specialist models on human and quadruped benchmarks while enabling zero-shot retargeting for the long tail of 3D creatures. Dataset is publicly available at https://huggingface.co/datasets/duckduckplz/Mobjaverse.
Abstract（参考訳）: 生成3Dアセットの爆発はアニメーションの膨大な需要を生み出しているが、現在のモーションキャプチャー手法は不安定であり、種固有のテンプレート(SMPLなど)に制限されている。 TopoCapはモノクロビデオから動きを抽出し、任意の骨格トポロジで文字に再ターゲティングできる最初の統合フレームワークである。我々の重要な洞察は、骨格構造は組合せ的かつ離散的であるが、運動の基盤となる物理学は連続した低次元多様体を占有しているということである。この洞察を、2段階の生成パイプラインを通じて実現します。まず、不均一なキネマティックチェインを共有の固定長潜在コードに圧縮するグラフCVAEを用いてユニバーサルモーションマニフォールドを学習する。ターゲットリグの構造的な埋め込みにデコーダを明示的に条件付けすることにより、骨格トポロジーから運動力学を分離する。第2に,映像とアニメーションを条件付きフローマッチング問題として扱い,これらのトポロジに依存しない符号を視覚的特徴から予測する。この一般化された事前学習のために,Objaverse-XLから算出した大規模データセットであるMobjaverseを紹介した。 5000以上のユニークな骨格トポロジと200万フレームで構成されており、既存のデータセットの構造的多様性を2桁以上上回っている。大規模な実験では、<MethodMotionは人間と4倍のベンチマークでスペシャリストモデルより優れており、3D生物の長い尾に対してゼロショットのリターゲティングを可能にしている。 Datasetはhttps://huggingface.co/datasets/duckduckplz/Mobjaverse.comで公開されている。

論文の概要: TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

関連論文リスト