Fugu-MT 論文翻訳(概要): Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams

論文の概要: Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams

arxiv url: http://arxiv.org/abs/2606.15200v1
Date: Sat, 13 Jun 2026 08:50:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:33.050714
Title: Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams
Title（参考訳）: 心に留めておく:エゴセントリックなビデオストリームにおけるユーザー中心の連続的空間的知性推論
Authors: Yun Wang, Junbin Xiao, Han Lyu, Yifan Wang, Jing Zuo, Zhanjie Zhang, Hong Huang, Dapeng Wu, Angela Yao,
Abstract要約: UCS-ベンチ (UCS-Bench) は、170時間以上のエゴセントリックな視覚観察と8.1K以上のタイムスタンプの質問のデータセットである。我々は、ストリーミングエゴセントリックな観測から構造化空間記憶をインクリメンタルに構築し、維持するフレームワークであるDirectMeを提案する。
参考スコア（独自算出の注目度）: 58.77207336324662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce UCS-Bench, a dataset spanning 170+ hours of egocentric visual observations with 8.1K+ timestamped questions for diagnosing User-Centric Continual Spatial intelligence in egocentric video streams. UCS-Bench targets a new problem that emphasizes dynamic spatial reasoning, long-term memory, and their alignment with users' real-time locations. We propose DirectMe, a framework that incrementally constructs and maintains a structured spatial memory from streaming egocentric observations. DirectMe enables robust tracking and recall of object locations, all relative to the user's movement over time. By tightly coupling visual perception with memory updates and spatial reasoning, our approach supports long-horizon queries that require recalling interactions, resolving viewpoint-induced ambiguities, and adapting to dynamic scenes. Our experiments show that DirectMe significantly improves the spatial reasoning of leading multimodal LLMs; it also surpasses many spatially aware and long-form streaming video models. We hope our benchmark and solution will advance spatial intelligence research for egocentric AI assistants. Data and code are available at https://github.com/cocowy1/UCS-Bench.
Abstract（参考訳）: UCS-Benchは170時間以上のエゴセントリックな視覚的観察のデータセットで、8.1K以上のタイムスタンプで、エゴセントリックなビデオストリームでユーザ中心の空間的知能を診断する。 UCS-Benchは、動的な空間推論、長期記憶、ユーザのリアルタイムロケーションとの整合性を強調する新しい問題をターゲットにしている。我々は、ストリーミングエゴセントリックな観測から構造化空間記憶をインクリメンタルに構築し、維持するフレームワークであるDirectMeを提案する。 DirectMeは、オブジェクト位置の堅牢な追跡とリコールを可能にする。記憶の更新や空間的推論と視覚知覚を密結合させることで、リコール操作、視点によるあいまいさの解消、動的シーンへの適応といった長軸クエリをサポートする。実験の結果,DirectMeは先行するマルチモーダルLLMの空間的推論を大幅に改善し,空間的認識と長大なストリーミングビデオモデルを上回る結果となった。われわれのベンチマークとソリューションは、エゴセントリックなAIアシスタントのための空間知能研究を前進させることを期待している。データとコードはhttps://github.com/cocowy1/UCS-Bench.comで入手できる。

論文の概要: Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams

関連論文リスト