Fugu-MT 論文翻訳(概要): InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

論文の概要: InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

arxiv url: http://arxiv.org/abs/2509.05747v1
Date: Sat, 06 Sep 2025 15:36:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.686841
Title: InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios
Title（参考訳）: InterAct: 日常シナリオにおける2人間の動的・表現的・対話的活動の大規模データセット
Authors: Leo Ho, Yinghao Huang, Dafei Qin, Mingyi Shi, Wangpok Tse, Wei Liu, Junichi Yamagishi, Taku Komura,
Abstract要約: 2人の活動を同時にモデル化し、客観的、動的、意味論的に一貫した相互作用を目標にすることを提案する。我々は、241のモーションシーケンスからなるInterActと呼ばれる新しいマルチモーダルデータセットをキャプチャする。 InterActには、個人の多様で複雑な動きと、これまでほとんど見られなかった興味深い、比較的長期にわたる相互作用パターンが含まれている。
参考スコア（独自算出の注目度）: 40.42003202491803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the problem of accurate capture of interactive behaviors between two people in daily scenarios. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and/or position of each actor are constant or barely change over each interaction. In contrast, we propose to simultaneously model two people's activities, and target objective-driven, dynamic, and semantically consistent interactions which often span longer duration and cover bigger space. To this end, we capture a new multi-modal dataset dubbed InterAct, which is composed of 241 motion sequences where two people perform a realistic and coherent scenario for one minute or longer over a complete interaction. For each sequence, two actors are assigned different roles and emotion labels, and collaborate to finish one task or conduct a common interaction activity. The audios, body motions, and facial expressions of both persons are captured. InterAct contains diverse and complex motions of individuals and interesting and relatively long-term interaction patterns barely seen before. We also demonstrate a simple yet effective diffusion-based method that estimates interactive face expressions and body motions of two people from speech inputs. Our method regresses the body motions in a hierarchical manner, and we also propose a novel fine-tuning mechanism to improve the lip accuracy of facial expressions. To facilitate further research, the data and code is made available at https://hku-cg.github.io/interact/ .
Abstract（参考訳）: 日常シナリオにおける2人の対話行動の正確な把握の問題に対処する。それまでのほとんどの作品では、一人の人物のみを考慮するか、2人の会話のジェスチャーのみに焦点を合わせ、各俳優の身体の向きや位置が各相互作用に対して一定またはほとんど変化しないと仮定していた。対照的に、我々は2人の活動を同時にモデル化し、より長い時間をかけてより大きな空間をカバーする客観的、動的、意味論的に一貫した相互作用をターゲットにすることを提案する。この目的のために、私たちはInterActと呼ばれる新しいマルチモーダルデータセットをキャプチャしました。それぞれのシーケンスに対して、2人のアクターが異なる役割と感情ラベルを割り当てられ、ひとつのタスクを完了するか、共通のインタラクションアクティビティを実行するために協力する。両方の人の音声、身体の動き、表情をキャプチャする。 InterActには、個人の多様で複雑な動きと、これまでほとんど見られなかった興味深い、比較的長期にわたる相互作用パターンが含まれている。また,対話型顔表情と2人の身体動作を音声入力から推定する,シンプルで効果的な拡散ベース手法を実証した。提案手法は, 顔表情の唇の動きを階層的に抑えるとともに, 表情の唇の精度を向上させるための新しい微調整機構も提案する。さらなる調査を容易にするため、データとコードはhttps://hku-cg.github.io/interact/ で公開されている。

論文の概要: InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

関連論文リスト