Fugu-MT 論文翻訳(概要): Don't Pause! Every prediction matters in a streaming video

論文の概要: Don't Pause! Every prediction matters in a streaming video

arxiv url: http://arxiv.org/abs/2604.24317v1
Date: Mon, 27 Apr 2026 11:07:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.915146
Title: Don't Pause! Every prediction matters in a streaming video
Title（参考訳）: 急ぐな! ストリーミング動画で予測が問題になる
Authors: Dibyadip Chatterjee, Zhanzhong Pang, Fadime Sener, Yale Song, Angela Yao,
Abstract要約: 一般的なストリーミング知覚とアシスト機能を評価するマルチターンプロアクティブクエリを特徴とするSPOT-Benchを提案する。 SPOT-BenchにはTimeliness-F1が付属している。 i)オフラインモデルは、確実にイベントを検知するが、スパム予測は失敗する; (ii) サイレントをトレーニングした後、スパムを減らし、応答を低下させる; (iii) ストリーミングビデオの半分は応答を期待しない。
参考スコア（独自算出の注目度）: 55.509551643600794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves streaming predictions untested. To close this gap, we introduce SPOT-Bench, featuring multi-turn proactive queries that evaluate general streaming perception and assistive capabilities required by an always-on, real-time assistant. SPOT-Bench comes with Timeliness-F1, a consolidated metric that measures streaming predictions by their temporal precision and balanced coverage across the entire video. Our benchmark reveals: (i) offline models detect events reliably but spam predictions unprompted; (ii) post-training for silence reduces spamming but induces unresponsiveness; (iii) half of the streaming video expects no response, which we term dead-time - compute spent here does not affect response latency. These findings motivate AsynKV, a training-free streaming adaptation of offline models, that retains their event perception while improving their streaming behavior. AsynKV features a long-short term memory, utilized efficiently by scaling compute during dead-time. It serves as a strong baseline on SPOT-Bench, outperforming existing streaming models, and achieves state-of-the-art on retrospective benchmarks.
Abstract（参考訳）: ストリーミングビデオモデルは、イベントが展開された瞬間に応答すべきであり、その瞬間が経過した後ではない。しかし、既存のオンラインビデオQAベンチマークはほとんどの振り返りのままである。彼らは固定されたタイムスタンプでビデオを一時停止し、現在のイベントや過去のイベントについて質問を投げかけ、その瞬間にのみモデルをスコアする。このプロトコルは、ストリーミング予測を未検証のまま残している。このギャップを埋めるために、常にオンのリアルタイムアシスタントが必要とする一般的なストリーミング知覚とアシスト機能を評価するマルチターンプロアクティブクエリを特徴とするSPOT-Benchを導入する。 SPOT-BenchにはTimeliness-F1が付属している。これはビデオ全体の時間的精度とバランスの取れたカバレッジによって、ストリーミングの予測を計測する統合メトリクスだ。私たちのベンチマークで明らかです。 (i)オフラインモデルはイベントを確実に検出するが、スパム予測は起こり得ない。二サイレントトレーニング後、スパムを減少させるが、無反応を引き起こすこと。 (iii) ストリーミングビデオの半分は応答を期待していませんが、これはデッドタイム(dead-time)と呼んでいます。これらの発見は、オフラインモデルのトレーニング不要なストリーミング適応であるAsynKVを動機付け、ストリーミングの振る舞いを改善しながらイベントの知覚を維持する。 AsynKVは長時間のメモリを備え、デッドタイム中に計算をスケールすることで効率よく利用している。 SPOT-Benchの強力なベースラインとして機能し、既存のストリーミングモデルを上回っ、振り返りベンチマークの最先端を達成する。

論文の概要: Don't Pause! Every prediction matters in a streaming video

関連論文リスト