Fugu-MT 論文翻訳(概要): EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

論文の概要: EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

arxiv url: http://arxiv.org/abs/2605.18214v2
Date: Fri, 22 May 2026 16:21:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 14:44:53.684903
Title: EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation
Title（参考訳）: EgoInteract: 対話理解と予測のための合成エゴセントリックビデオ生成
Authors: Rosario Leonardi, Francesco Ragusa, Daniele Materia, Alessandro Passanisi, James Fort, Jakob Engel, Giovanni Maria Farinella,
Abstract要約: 微細なエゴセントリックな相互作用とその時間的ダイナミクスをモデル化するための,エゴセントリックなビデオ生成のための制御可能なシミュレータを提案する。我々は,時間的動作のセグメンテーション,次のアクティブ物体検出,相互作用予測,手動物体の相互作用検出のための高密度な空間的および時間的アノテーションを備えた合成エゴセントリックビデオデータセットを生成する。その結果、タスクとデータセット間の強いベースラインよりも一貫した改善が示され、シミュレーションベースのアプローチの有効性と転送性を示している。
参考スコア（独自算出の注目度）: 45.01838097419948
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Collecting large-scale egocentric video datasets with dense spatial and temporal annotations is costly, slow, and often constrained by environmental biases, privacy constraints, and limited coverage of interaction patterns. While synthetic data has shown strong potential in several vision domains, its use for egocentric perception remains relatively underexplored, especially for tasks requiring temporally coherent human-object interactions. In this work, we introduce EgoInteract, a controllable simulator for egocentric video generation designed to model fine-grained egocentric interactions and their temporal dynamics. The simulator enables precise control over camera, human body and hand motion, object manipulation, and scene composition across diverse environments. Building on this framework, we generate a synthetic egocentric video dataset with dense spatial and temporal annotations for temporal action segmentation, next-active object detection, interaction anticipation, and hand-object interaction detection. We evaluate models trained with simulated data on multiple real-world egocentric benchmarks spanning diverse environments, object categories, and interaction patterns. Results show consistent improvements over strong baselines across tasks and datasets, demonstrating the effectiveness and transferability of our simulation-based approach.
Abstract（参考訳）: 密集した空間的および時間的アノテーションで大規模なエゴセントリックなビデオデータセットを収集することは、コストが高く、遅く、しばしば環境バイアス、プライバシーの制約、インタラクションパターンの限られたカバレッジによって制約される。合成データはいくつかの視覚領域において強いポテンシャルを示してきたが、その自我中心の知覚への利用は、特に時間的に整合した人間と物体の相互作用を必要とするタスクに対して、比較的過小評価されている。本研究では,エゴセントリックビデオ生成のための制御可能なシミュレータであるEgoInteractを紹介し,細粒度なエゴセントリックなインタラクションとその時間的ダイナミクスをモデル化する。このシミュレータは、カメラ、人体、手の動き、オブジェクト操作、および様々な環境におけるシーン構成の正確な制御を可能にする。この枠組みに基づいて,時間的アクションセグメンテーション,次アクティブオブジェクト検出,インタラクション予測,手動オブジェクトインタラクション検出のための高密度な空間的および時間的アノテーションを備えた合成エゴセントリックビデオデータセットを生成する。我々は,多様な環境,オブジェクトカテゴリ,インタラクションパターンにまたがる複数の実世界のエゴセントリックベンチマークにおいて,シミュレーションデータを用いて訓練されたモデルを評価する。その結果、タスクとデータセット間の強いベースラインよりも一貫した改善が示され、シミュレーションベースのアプローチの有効性と転送性を示している。

論文の概要: EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation

関連論文リスト