Fugu-MT 論文翻訳(概要): EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

論文の概要: EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

arxiv url: http://arxiv.org/abs/2605.31557v2
Date: Mon, 01 Jun 2026 11:50:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 18:24:16.934191
Title: EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision
Title（参考訳）: EGOSTREAM: 自我中心視におけるエピソード記憶のストリーミングのための診断ベンチマーク
Authors: Rosario Forte, Giuseppe Lando, Antonino Furnari,
Abstract要約: 連続エピソードメモリは自律エージェントのコア機能である。 Egostreamは、egocentric Visionにおけるエピソードメモリ評価の診断ベンチマークである。
参考スコア（独自算出の注目度）: 9.701124246177661
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continuous episodic memory is a core capability for autonomous agents operating in dynamic, real-world environments, yet current streaming video benchmarks provide limited tools for diagnosing what models remember and for how long. We introduce Egostream, a diagnostic benchmark for streaming episodic memory evaluation in egocentric vision. \egostream organizes 2,250 curated questions along seven cognitive dimensions: detail, spatial, temporal, event, social, causal, and prospective memory. We introduce the Answer Validity Window (AVW), which specifies the temporal span an answer remains valid as the observed scene evolves. This allows us to expand the questions into 8,528 recall-conditioned evaluations, enabling controlled testing from instant to ultra-long-term recall while separating genuine model forgetting from natural world-state changes. We rigorously establish baseline performance through a unified streaming MLLM framework that compares several state-of-the-art memory-management mechanisms, covering sliding windows, attention sinks, KV-cache pruning, merging, and offloading. Experiments within a unified Qwen3-VL backbone reveal that comparable aggregate accuracies mask starkly different memory profiles. For instance, token pruning preserves fine-grained details and temporal structure significantly better than token merging, while quantized offloading rescues ultra-long-term recall. Ultimately, all mechanisms operate well below real-time (>1s per frame), and top performing methods ceil at about 45% accuracy, exposing critical gaps in current architectures. Egostream provides the diagnostic testbed needed to close these gaps. Project website, news and updates at: https://saroo25.github.io/Egostream/
Abstract（参考訳）: 連続エピソードメモリは、動的で現実世界の環境で動く自律エージェントの中核機能であるが、現在のストリーミングビデオベンチマークは、モデルを記憶し、どのくらいの期間にわたって診断する限られたツールを提供する。本稿では,エゴセントリック視覚におけるエピソードメモリ評価のための診断ベンチマークであるEgostreamを紹介する。 \egostreamは、細部、空間、時間、出来事、社会的、因果、予知記憶の7つの認知次元に沿って、2,250のキュレートされた質問を整理する。本稿では,観測シーンの進行とともに応答の時間的スパンが有効であることを示すアンサー検証ウィンドウ(AVW)を提案する。これにより、質問を8,528件のリコール条件付き評価に拡張し、自然世界状態の変化から真のモデル忘れを分離しながら、即時から超長期のリコールまで制御されたテストを可能にする。我々は,複数の最先端メモリ管理機構を比較し,スライディングウィンドウ,アテンションシンク,KVキャッシュプルーニング,マージ,オフロードを網羅する統合ストリーミングMLLMフレームワークを用いて,ベースライン性能を厳格に確立する。統一されたQwen3-VLバックボーン内での実験では、同等のアキュラシーマスクが驚くほど異なるメモリプロファイルを隠蔽していることが明らかになった。例えば、トークンプルーニングはトークンのマージよりも微細な詳細と時間構造を保ち、量子化されたオフロードは極長期のリコールを回収する。究極的には、すべてのメカニズムはリアルタイム(フレームあたり1秒未満)で動作し、トップパフォーマンスメソッドは約45%の精度で停止し、現在のアーキテクチャにおける重要なギャップを露呈する。 Egostreamはこれらのギャップを埋めるために必要な診断テストベッドを提供する。プロジェクトのWebサイト、ニュース、アップデート: https://saroo25.github.io/Egostream/

論文の概要: EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

関連論文リスト