Fugu-MT 論文翻訳(概要): Contact-aware Human Motion Generation from Textual Descriptions

論文の概要: Contact-aware Human Motion Generation from Textual Descriptions

arxiv url: http://arxiv.org/abs/2403.15709v1
Date: Sat, 23 Mar 2024 04:08:39 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-26 21:32:08.081437
Title: Contact-aware Human Motion Generation from Textual Descriptions
Title（参考訳）: テキスト記述による接触認識型人間動作生成
Authors: Sihan Ma, Qiong Cao, Jing Zhang, Dacheng Tao,
Abstract要約: 我々は、接触認識テキストを表すRICH-CATという新しいデータセットを作成する。そこで本研究では,テキストによる対話型人間の動作合成のためのCATMOという新しい手法を提案する。本実験は,既存のテキスト・トゥ・モーション法と比較して,提案手法の優れた性能を示すものである。
参考スコア（独自算出の注目度）: 57.871692507044344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the problem of generating 3D interactive human motion from text. Given a textual description depicting the actions of different body parts in contact with objects, we synthesize sequences of 3D body poses that are visually natural and physically plausible. Yet, this task poses a significant challenge due to the inadequate consideration of interactions by physical contacts in both motion and textual descriptions, leading to unnatural and implausible sequences. To tackle this challenge, we create a novel dataset named RICH-CAT, representing ``Contact-Aware Texts'' constructed from the RICH dataset. RICH-CAT comprises high-quality motion, accurate human-object contact labels, and detailed textual descriptions, encompassing over 8,500 motion-text pairs across 26 indoor/outdoor actions. Leveraging RICH-CAT, we propose a novel approach named CATMO for text-driven interactive human motion synthesis that explicitly integrates human body contacts as evidence. We employ two VQ-VAE models to encode motion and body contact sequences into distinct yet complementary latent spaces and an intertwined GPT for generating human motions and contacts in a mutually conditioned manner. Additionally, we introduce a pre-trained text encoder to learn textual embeddings that better discriminate among various contact types, allowing for more precise control over synthesized motions and contacts. Our experiments demonstrate the superior performance of our approach compared to existing text-to-motion methods, producing stable, contact-aware motion sequences. Code and data will be available for research purposes.
Abstract（参考訳）: 本稿では,テキストから3次元対話型人間の動作を生成する問題に対処する。物体に接触する異なる身体部位の動作を記述したテキスト記述が与えられた場合、視覚的に自然で身体的にも可視な3次元身体ポーズのシーケンスを合成する。しかし、この課題は、運動とテキスト記述の両方における物理的接触による相互作用の不十分な考慮により、不自然で不可解なシーケンスをもたらすため、重大な課題となる。この課題に対処するために、RICHデータセットから構築された ``Contact-Aware Texts'' を表す、RICH-CAT という新しいデータセットを作成します。 RICH-CATは、高品質なモーション、正確な人物接触ラベル、詳細なテキスト記述を含み、26の屋内/屋外アクションにまたがる8,500以上のモーションテキストペアを含んでいる。 RICH-CATを活用することで,人体接触をエビデンスとして明示的に統合するテキスト駆動対話型人体動作合成のためのCATMOという新しいアプローチを提案する。我々は2つのVQ-VAEモデルを用いて、動きと身体の接触配列を相補的な遅延空間に符号化し、人間の動きと接触を相互に条件付きで生成する。さらに,テキストエンコーダを導入し,テキスト埋め込みを学習し,様々な種類の接触を識別し,合成された動きや接触をより正確に制御できるようにする。本実験は,既存のテキスト・トゥ・モーション法と比較して,本手法の優れた性能を実証し,安定した接触対応動作系列を生成する。コードとデータは研究目的で利用できる。

論文の概要: Contact-aware Human Motion Generation from Textual Descriptions

関連論文リスト