Fugu-MT 論文翻訳(概要): Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models

論文の概要: Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models

arxiv url: http://arxiv.org/abs/2604.22164v1
Date: Fri, 24 Apr 2026 02:27:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.312274
Title: Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models
Title（参考訳）: 変圧器モデルを用いたペアリングインタラクションデータからの反応型ヒューマンモーション生成の学習
Authors: Masato Soga, Ryuki Takebayashi,
Abstract要約: 本研究は,対話シナリオにおける他者の動作に基づいて,ある人物の動作を生成する問題に対処する。簡単なTransformer、iTransformer、Crossformerの3つのモデルを実装し比較する。実験結果から, 簡易トランスフォーマーは姿勢崩壊に悩まされることなく, 可塑性相互作用を意識した動作を生成できることが示唆された。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning have enabled the generation of videos from textual descriptions as well as the prediction of future sequences from input videos. Similarly, in human motion modeling, motions can be generated from text or predicted from a single person's motion sequence. However, these approaches primarily focus on single-agent motion generation. In contrast, this study addresses the problem of generating the motion of one person based on the motion of another in interaction scenarios, where the two motions are mutually dependent. We construct a dataset of paired action-reaction motion sequences extracted from boxing match videos and investigate the effectiveness of Transformer-based models for this task. Specifically, we implement and compare three models: a simple Transformer, iTransformer, and Crossformer. In addition, we introduce a person ID embedding to explicitly distinguish between individuals, enabling the model to maintain structural consistency and better capture interaction dynamics. Experimental results show that the simple Transformer can generate plausible interaction-aware motions without suffering from posture collapse, while iTransformer and Crossformer accumulate errors over time, leading to unstable motion generation. Furthermore, the proposed person ID embedding contributes to preventing structural collapse and improving motion consistency. These results highlight the importance of explicitly modeling individual identity in interaction-aware motion generation.
Abstract（参考訳）: 近年のディープラーニングの進歩により、テキスト記述からの動画生成や、入力ビデオからの将来のシーケンスの予測が可能になった。同様に、人間のモーションモデリングでは、動きはテキストから生成されるか、一人の人のモーションシーケンスから予測される。しかしながら、これらのアプローチは主に単一エージェントのモーション生成に焦点を当てている。対照的に,本研究では, 相互に依存する相互作用シナリオにおいて, 他者の動きに基づく1人の動きを生成する問題に対処する。ボクシングマッチビデオから抽出した2組のアクション・アクション・モーション・シーケンスのデータセットを構築し,このタスクに対するトランスフォーマーモデルの有効性について検討する。具体的には、単純なTransformer、iTransformer、Crossformerの3つのモデルを実装し比較する。さらに,個人を明瞭に識別するために,人物IDの埋め込みを導入し,モデルが構造的整合性を維持し,相互作用のダイナミクスをよりよく捉えられるようにした。 iTransformerとCrossformerは時間とともに誤差を蓄積し、不安定な動きの発生につながる。さらに, 提案した人物IDの埋め込みは, 構造崩壊の防止と動きの整合性の向上に寄与する。これらの結果は、対話型動作生成における個人識別を明示的にモデル化することの重要性を強調している。

論文の概要: Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models

関連論文リスト