Fugu-MT 論文翻訳(概要): EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

論文の概要: EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

arxiv url: http://arxiv.org/abs/2605.17262v1
Date: Sun, 17 May 2026 05:05:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.814818
Title: EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning
Title（参考訳）: EgoIntrospect: ユーザ中心の内部状態推論のためのエゴセントリックデータセットとベンチマーク
Authors: Zeyu Wang, Chang Liu, Eduardus Tjitrahardja, Yuntao Wang, Borislav Pavlov, Fangfei Gou, Jose Manuel Davila, Dai Shi, Ran Xu, Yue Pan, Jiayi Tan, Shuting Chang, Qi Wang, Jinzhao Li, Jiacheng Hua, Yifei Huang, Jingwei Sun, Yu Zhang, Liuxin Zhang, Guocai Yao, Jia Jia, Yin Li, Qianying Wang, Yuanchun Shi, Miao Liu,
Abstract要約: EgoIntrospectは、セルフアノテーションを備えたユーザ駆動のシナリオでキャプチャされた最初のエゴセントリックなデータセットである。収録時間は60人から180時間、平均録音時間は1人あたり3時間である。我々は、感情経験、インタラクティブな意図、認知記憶など、ユーザ内部状態を中心とした一連のタスクを形式化する。
参考スコア（独自算出の注目度）: 47.853306116245484
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the ability of modern multimodal large language models to reason about users' internal states from egocentric observations. Experiments on our benchmark suggest that existing multimodal large language models struggle to effectively leverage multimodal signals to infer users' subjective internal states. The dataset and annotations will be made publicly available to advance research in egocentric vision and wearable AI assistants. Project page: https://ego-introspect.github.io/
Abstract（参考訳）: エゴセントリックなビデオデータセットとベンチマークに対する広範な取り組みにもかかわらず、シームレスなAIアシスタントエクスペリエンスを実現する上で不可欠な、ユーザの内部状態の理解はほとんど見過ごされている。 EgoIntrospectは、ユーザ主導のシナリオでキャプチャされた最初のエゴセントリックなデータセットで、自己アノテーションにより、AIアシスタントによるユーザの対話的意図を明確に示す。 EgoIntrospectはクロスデバイスセットアップを使用して収集され、同期ビデオ、オーディオ、視線、動き、生理的信号を提供する。収録時間は60人から180時間、平均録音時間は1人あたり3時間である。 EgoIntrospectを活用することで、情緒的体験、インタラクティブな意図、認知記憶など、ユーザ内部状態を中心とした一連のタスクをフォーマル化する。さらに、アノテーションを処理して、エゴセントリックな観察からユーザの内部状態を推論する、現代のマルチモーダルな大規模言語モデルの能力を評価するベンチマークを構築する。我々のベンチマーク実験から,既存のマルチモーダル大言語モデルでは,ユーザの主観的内部状態を推定するために,マルチモーダル信号の有効活用に苦慮していることが示唆された。データセットとアノテーションは、エゴセントリックなビジョンとウェアラブルAIアシスタントの研究を進めるために公開される。プロジェクトページ: https://ego-introspect.github.io/

論文の概要: EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

関連論文リスト