Fugu-MT 論文翻訳(概要): Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation

論文の概要: Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation

arxiv url: http://arxiv.org/abs/2510.06504v1
Date: Tue, 07 Oct 2025 22:41:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.221817
Title: Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
Title（参考訳）: Text2Interact:2対2対2対2対2対2対のインタラクション生成
Authors: Qingxuan Wu, Zhiyang Dou, Chuan Guo, Yiming Huang, Qiao Feng, Bing Zhou, Jian Wang, Lingjie Liu,
Abstract要約: 本研究では,現実的なテキスト・ヒューマンインタラクションを生成するためのText2フレームワークを提案する。本稿では,対話記述と強いシングルパーソン動作を協調する合成合成パイプラインであるInterComposeを提案する。また,トークンレベルの手がかりを保存した単語レベルの条件付きテキスト対話モデルであるInterActorを提案する。
参考スコア（独自算出の注目度）: 39.67266918328847
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling human-human interactions from text remains challenging because it requires not only realistic individual dynamics but also precise, text-consistent spatiotemporal coupling between agents. Currently, progress is hindered by 1) limited two-person training data, inadequate to capture the diverse intricacies of two-person interactions; and 2) insufficiently fine-grained text-to-interaction modeling, where language conditioning collapses rich, structured prompts into a single sentence embedding. To address these limitations, we propose our Text2Interact framework, designed to generate realistic, text-aligned human-human interactions through a scalable high-fidelity interaction data synthesizer and an effective spatiotemporal coordination pipeline. First, we present InterCompose, a scalable synthesis-by-composition pipeline that aligns LLM-generated interaction descriptions with strong single-person motion priors. Given a prompt and a motion for an agent, InterCompose retrieves candidate single-person motions, trains a conditional reaction generator for another agent, and uses a neural motion evaluator to filter weak or misaligned samples-expanding interaction coverage without extra capture. Second, we propose InterActor, a text-to-interaction model with word-level conditioning that preserves token-level cues (initiation, response, contact ordering) and an adaptive interaction loss that emphasizes contextually relevant inter-person joint pairs, improving coupling and physical plausibility for fine-grained interaction modeling. Extensive experiments show consistent gains in motion diversity, fidelity, and generalization, including out-of-distribution scenarios and user studies. We will release code and models to facilitate reproducibility.
Abstract（参考訳）: テキストからの人間と人間の相互作用のモデル化は、現実的な個人力学だけでなく、エージェント間の時間的結合も必要であるため、依然として困難なままである。現在、進歩は妨げられている 1)2人の交流の多様な複雑さを捉えるのに不十分な2人の訓練データ 2) 言語条件がリッチに崩壊し, 構造化されたプロンプトが1つの文に埋め込まれるような, きめ細かなテキスト間相互作用モデリングは不十分である。これらの制約に対処するために,スケーラブルな高忠実性インタラクションデータ合成器と効果的な時空間協調パイプラインを用いて,現実的なテキスト整列型ヒューマンインタラクションを生成するためのText2Interactフレームワークを提案する。まず、LSM生成した相互作用記述と強力なシングルパーソンモーション前処理を協調するスケーラブルな合成合成合成パイプラインであるInterComposeを提案する。エージェントに対するプロンプトと動作が与えられた後、InterComposeは候補のシングルパーソン動作を検索し、他のエージェントに対して条件付き反応生成器を訓練し、ニューラルネットワークによる動作評価器を使用して、弱いまたは不整合なサンプルの相互作用カバレッジを、余分なキャプチャーなしでフィルタリングする。第2に、トークンレベルの手がかり(開始、応答、接触順序付け)を保存し、文脈的に関連する対の相互作用を強調させる適応的相互作用損失を抑える、単語レベルの条件付きテキスト・ツー・インタラクションモデルであるInterActorを提案する。広範囲にわたる実験は、アウト・オブ・ディストリビューションのシナリオやユーザスタディを含む、動きの多様性、忠実さ、一般化が一貫して向上していることを示している。再現性を促進するために、コードとモデルをリリースします。

論文の概要: Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation

関連論文リスト