Fugu-MT 論文翻訳(概要): ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

論文の概要: ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

arxiv url: http://arxiv.org/abs/2604.17623v2
Date: Wed, 22 Apr 2026 21:08:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.008471
Title: ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes
Title（参考訳）: ViPS: 自動リグされたメッシュのためのビデオインフォームされたPoseスペース
Authors: Honglin Chen, Karran Pandey, Rundi Wu, Matheus Gadelha, Yannick Hold-Geoffroy, Ayush Tewari, Niloy J. Mitra, Changxi Zheng, Paul Guerrero,
Abstract要約: Video-informed Pose Spaces (ViPS)は、自動リップメッシュのための有効な調律の潜時分布を検出するフィードフォワードフレームワークである。 ViPSは生成ビデオの先行値を所定のリグパラメータ化上の普遍分布に転送する。評価の結果,VPSは,合成アーティストが作成した4Dデータに基づいて訓練した最先端の手法の性能と,妥当性と多様性の両面で一致していることがわかった。
参考スコア（独自算出の注目度）: 55.32681167870698
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kinematic rigs provide a structured interface for articulating 3D meshes, but they lack an inherent representation of the plausible manifold of joint configurations for a given asset. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters often leads to semantic or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feed-forward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce artist-authored 4D datasets, ViPS transfers generative video priors into a universal distribution over a given rig parameterization. Differentiable geometric validators applied to the skinned mesh enforce asset-specific validity without requiring manual regularizers. Our model learns a smooth, compact, and controllable pose space that supports diverse sampling, manifold projection for inverse kinematics, and temporally coherent trajectories for keyframing. Furthermore, the distilled 3D pose samples serve as precise semantic proxies for guiding video diffusion, effectively closing the loop between generative 2D priors and structured 3D kinematic control. Our evaluations show that ViPS, trained solely on video priors, matches the performance of state-of-the-art methods trained on synthetic artist-created 4D data in both plausibility and diversity. Most importantly, as a universal model, ViPS demonstrates robust zero-shot generalization to out-of-distribution species and unseen skeletal topologies.
Abstract（参考訳）: キネマティック・リグ (Kinematic rigs) は、3次元メッシュを記述するための構造化されたインタフェースを提供するが、与えられた資産に対する関節構成の可塑性多様体の固有の表現は欠如している。このようなポーズ空間がなければ、生のリグパラメータの確率的なサンプリングや手動操作は、しばしば解剖学的過拡張や非物理的自己切断のような意味的または幾何学的な違反を引き起こす。本稿では,予め訓練されたビデオ拡散モデルから先行動作を蒸留することにより,自動結束メッシュに対する有効な調律の潜時分布を検出するためのフィードフォワードフレームワークであるビデオインフォームド・ポーズ・スペース(ViPS)を提案する。アーティストが認可する少ない4Dデータセットに依存する既存の方法とは異なり、VPSは生成ビデオの先行データを所定のリグパラメータ化上の普遍的な分布に転送する。スキン付きメッシュに適用された微分幾何学的検証器は、手動正規化器を必要とせず、アセット固有の妥当性を強制する。我々のモデルは、多様なサンプリング、逆運動学のための多様体射影、キーフレーミングのための時間的コヒーレントな軌跡をサポートする滑らかでコンパクトで制御可能なポーズ空間を学習する。さらに、この蒸留された3Dポーズサンプルは、ビデオ拡散を導くための正確なセマンティックプロキシとして機能し、生成2D前駆体と構造化3Dキネマティック制御の間のループを効果的に閉じる。以上の結果から,VIPSは,合成アーティストが作成した4Dデータに基づいてトレーニングした最先端の手法の性能と,その妥当性と多様性の両面で一致していることが示唆された。最も重要なことは、普遍的なモデルとして、ViPSは分布外種や目に見えない骨格トポロジーへの堅牢なゼロショットの一般化を示すことである。

論文の概要: ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

関連論文リスト