Fugu-MT 論文翻訳(概要): Kimodo: Scaling Controllable Human Motion Generation

論文の概要: Kimodo: Scaling Controllable Human Motion Generation

arxiv url: http://arxiv.org/abs/2603.15546v1
Date: Mon, 16 Mar 2026 17:09:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.621512
Title: Kimodo: Scaling Controllable Human Motion Generation
Title（参考訳）: キモド: 制御可能な人体運動生成のスケーリング
Authors: Davis Rempe, Mathis Petrovich, Ye Yuan, Haotian Zhang, Xue Bin Peng, Yifeng Jiang, Tingwu Wang, Umar Iqbal, David Minor, Michael de Ruyter, Jiefeng Li, Chen Tessler, Edy Lim, Eugene Jeong, Sam Wu, Ehsan Hassani, Michael Huang, Jin-Bey Yu, Chaeyeon Chung, Lina Song, Olivier Dionne, Jan Kautz, Simon Yuen, Sanja Fidler,
Abstract要約: キモド(Kimodo)は、700時間の光学式モーションキャプチャーデータに基づいて訓練された、制御可能な運動拡散モデルである。本モデルでは,テキストと包括的キネマティック制約によって制御し,高品質な動作を生成する。
参考スコア（独自算出の注目度）: 77.66868439601062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-quality human motion data is becoming increasingly important for applications in robotics, simulation, and entertainment. Recent generative models offer a potential data source, enabling human motion synthesis through intuitive inputs like text prompts or kinematic constraints on poses. However, the small scale of public mocap datasets has limited the motion quality, control accuracy, and generalization of these models. In this work, we introduce Kimodo, an expressive and controllable kinematic motion diffusion model trained on 700 hours of optical motion capture data. Our model generates high-quality motions while being easily controlled through text and a comprehensive suite of kinematic constraints including full-body keyframes, sparse joint positions/rotations, 2D waypoints, and dense 2D paths. This is enabled through a carefully designed motion representation and two-stage denoiser architecture that decomposes root and body prediction to minimize motion artifacts while allowing for flexible constraint conditioning. Experiments on the large-scale mocap dataset justify key design decisions and analyze how the scaling of dataset size and model size affect performance.
Abstract（参考訳）: ロボット工学、シミュレーション、エンターテイメントの応用において、高品質な人間のモーションデータがますます重要になっている。最近の生成モデルは潜在的なデータソースを提供し、テキストプロンプトやポーズのキネマティック制約のような直感的な入力を通じて人間のモーション合成を可能にする。しかし、公共モキャップデータセットの小さなスケールでは、これらのモデルの運動品質、制御精度、一般化が制限されている。本研究では,700時間の光学的モーションキャプチャデータに基づいて学習した,表現的かつ制御可能な運動拡散モデルであるKimodoを紹介する。本モデルでは,全体キーフレーム,疎関節位置・回転,2Dウェイポイント,高密度2Dパスを含む運動制約の包括的スイートと,テキストを通じて容易に制御できる高品質な動作を生成する。これは、ルートとボディの予測を分解し、柔軟な制約条件付けを可能にしながら、動きのアーティファクトを最小限に抑えるために、慎重に設計された動き表現と2段階のデノイザアーキテクチャによって実現される。大規模なmocapデータセットの実験は、重要な設計決定を正当化し、データセットのサイズとモデルサイズのスケーリングがパフォーマンスに与える影響を分析する。

論文の概要: Kimodo: Scaling Controllable Human Motion Generation

関連論文リスト