Fugu-MT 論文翻訳(概要): SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

論文の概要: SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

arxiv url: http://arxiv.org/abs/2304.10417v3
Date: Tue, 26 Mar 2024 13:16:02 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-28 01:39:58.482849
Title: SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation
Title（参考訳）: SINC:同時行動生成のための3次元人体運動の空間構成
Authors: Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol,
Abstract要約: 我々のゴールは、同時動作を記述するテキスト入力を与えられた3次元人間の動作を合成することである。我々は「空間構成」というような同時的な動きを生み出すことを指す。
参考スコア（独自算出の注目度）: 58.25766404147109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, that training with such GPT-guided synthetic data improves spatial composition generation over baselines. Our code is publicly available at https://sinc.is.tue.mpg.de/.
Abstract（参考訳）: 我々のゴールは、同時動作を記述したテキスト入力が与えられた3次元の人間の動きを合成することである。このような同時的な動きを「空間構成」と呼ぶ。ある行動から別の行動へ移行しようとする時間的構成とは対照的に、空間的合成では、どの行動にどの身体部位が関与しているかを理解し、同時にそれらを動かすことができる必要がある。動作と身体部位の対応が強力な言語モデルにエンコードされているという観察に触発され,GPT-3に「アクション・ネーム」に関わる身体部位は何か?」などのテキストを入力し,部品リストと少数ショット例も提供する。このアクション部分マッピングを前提として,2つの動作の身体部分を組み合わせて,2つの動作を空間的に構成する最初の自動手法を確立する。しかし、構成動作によるトレーニングデータは、常にコンビネータによって制限される。そこで,本手法を用いて合成データを作成し,SINC(SImultaneous actioN compositions for 3D Human Motions)と呼ばれる最先端のテキスト・モーション生成モデルをトレーニングする。実験では,GPT誘導合成データを用いたトレーニングにより,ベースライン上での空間組成生成が改善された。私たちのコードはhttps://sinc.is.tue.mpg.de/で公開されています。

関連論文リスト

Jointly Understand Your Command and Intention:Reciprocal Co-Evolution between Scene-Aware 3D Human Motion Synthesis and Analysis [80.50342609047091]
シーン認識型テキスト対人間合成は、同じテキスト記述から多様な屋内動作サンプルを生成する。そこで本研究では,テキスト駆動型シーン特異的なヒューマンモーション生成を3段階に分解するケースドジェネレーション戦略を提案する。我々は3次元シーンにおけるリアルな人間の動き合成と頑健な人間の動き解析を共同で改善する。
論文参考訳（メタデータ） (2025-03-01T06:56:58Z)
Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models [9.739611757541535]
私たちのアプローチでは、複雑なアクションをより単純な動き、特にトレーニング中に観察される動作に分解します。これらの単純な動きは、拡散モデルの性質を用いて単一の現実的なアニメーションに結合される。本研究では,2つの人間の動作データセットを基本的な動作と複雑な動作に分割して評価し,その性能を最先端の動作と比較する。
論文参考訳（メタデータ） (2024-09-18T12:32:39Z)
Contact-aware Human Motion Generation from Textual Descriptions [57.871692507044344]
本稿では,テキストから3次元対話型人間の動作を生成する問題に対処する。私たちは「Contact-Aware Texts」を表すRICH-CATという新しいデータセットを作成します。そこで本研究では,テキストによる対話型人間の動作合成のためのCATMOという新しい手法を提案する。
論文参考訳（メタデータ） (2024-03-23T04:08:39Z)
Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
我々は,ファインヒューマンML3Dという微細なテキスト記述を専門とする大規模言語移動データセットを構築した。新しいテキスト2モーションモデルであるFineMotionDiffuseを設計し、微細なテキスト情報をフル活用する。 FineMotionDiffuseはFinHumanML3Dで訓練し,FIDを0.38の差で改善した。
論文参考訳（メタデータ） (2024-03-20T11:38:30Z)
GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
手は器用で多用途なマニピュレータであり、人間が物体や環境とどのように相互作用するかの中心である。現実的な手オブジェクトの相互作用をモデル化することは、コンピュータグラフィックス、コンピュータビジョン、混合現実の応用において重要である。 GRIPは、体と物体の3次元運動を入力として取り、物体の相互作用の前、中、後の両方の両手の現実的な動きを合成する学習ベースの手法である。
論文参考訳（メタデータ） (2023-08-22T17:59:51Z)
IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object Interactions [69.95820880360345]
そこで本研究では,仮想人物の全身動作を3Dオブジェクトで合成する最初のフレームワークを提案する。本システムでは,オブジェクトと仮想文字の関連意図を入力テキストとして記述する。その結果, 80%以上のシナリオにおいて, 合成された全身運動は参加者よりリアルに見えることがわかった。
論文参考訳（メタデータ） (2022-12-14T23:59:24Z)
TEACH: Temporal Action Composition for 3D Humans [50.97135662063117]
自然言語の一連の記述を前提として,テキストに意味的に対応する3次元の人間の動作を生成する。特に、我々のゴールは一連のアクションの合成を可能にすることであり、これは時間的アクション合成と呼ばれる。
論文参考訳（メタデータ） (2022-09-09T00:33:40Z)
Synthesis of Compositional Animations from Textual Descriptions [54.85920052559239]
「どんなに非構造的で複雑で、文を作りながら、それからもっともらしい動きを生成できるのか。」「映画の脚本から3Dキャラクタをアニメーションしたり、ロボットに何をしたいのかを伝えるだけで動かせるのか?」
論文参考訳（メタデータ） (2021-03-26T18:23:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。