Fugu-MT 論文翻訳(概要): Towards Streaming Perception

論文の概要: Towards Streaming Perception

arxiv url: http://arxiv.org/abs/2005.10420v2
Date: Tue, 25 Aug 2020 01:16:43 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-30 23:39:54.037877
Title: Towards Streaming Perception
Title（参考訳）: ストリーミング知覚に向けて
Authors: Mengtian Li, Yu-Xiong Wang, Deva Ramanan
Abstract要約: 本稿では、リアルタイムオンライン知覚のための単一のメトリクスにレイテンシと精度を協調的に統合するアプローチを提案する。この指標の背後にある重要な洞察は、瞬間ごとに認識スタック全体の出力を共同で評価することである。本稿では,都市ビデオストリームにおけるオブジェクト検出とインスタンスセグメンテーションの具体的タスクに注目し,高品質で時間依存的なアノテーションを備えた新しいデータセットを寄贈する。
参考スコア（独自算出の注目度）: 70.68520310095155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embodied perception refers to the ability of an autonomous agent to perceive its environment so that it can (re)act. The responsiveness of the agent is largely governed by latency of its processing pipeline. While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve. We point out a discrepancy between standard offline evaluation and real-time applications: by the time an algorithm finishes processing a particular frame, the surrounding world has changed. To these ends, we present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception, which we refer to as "streaming accuracy". The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant, forcing the stack to consider the amount of streaming data that should be ignored while computation is occurring. More broadly, building upon this metric, we introduce a meta-benchmark that systematically converts any single-frame task into a streaming perception task. We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations. Our proposed solutions and their empirical analysis demonstrate a number of surprising conclusions: (1) there exists an optimal "sweet spot" that maximizes streaming accuracy along the Pareto optimal latency-accuracy curve, (2) asynchronous tracking and future forecasting naturally emerge as internal representations that enable streaming perception, and (3) dynamic scheduling can be used to overcome temporal aliasing, yielding the paradoxical result that latency is sometimes minimized by sitting idle and "doing nothing".
Abstract（参考訳）: 身体的知覚(embodied perception)とは、自律的なエージェントがその環境を知覚して行動できる能力のことである。エージェントの応答性は、主に処理パイプラインの遅延によって制御される。過去の研究はレイテンシと精度の間のアルゴリズム上のトレードオフを研究してきたが、Paretoの最適遅延精度曲線に沿って異なる手法を比較するための明確な基準は存在しなかった。アルゴリズムが特定のフレームの処理を完了すると、周囲の世界は変化した。そこで本稿では,リアルタイムオンライン認識のための単一の指標として,レイテンシと精度を協調的に統合する手法を提案する。このメトリックの背後にある重要な洞察は、瞬時に認識スタック全体の出力を共同で評価することであり、計算中に無視されるべきストリーミングデータの量をスタックが考慮せざるを得ない。より広範に、この指標に基づいて、任意の単一フレームタスクをストリーミング知覚タスクに体系的に変換するメタベンチマークを導入する。本稿では,都市ビデオストリームにおけるオブジェクト検出とインスタンスセグメンテーションの具体的タスクに注目し,高品質で時間依存的なアノテーションを備えた新しいデータセットを寄贈する。 Our proposed solutions and their empirical analysis demonstrate a number of surprising conclusions: (1) there exists an optimal "sweet spot" that maximizes streaming accuracy along the Pareto optimal latency-accuracy curve, (2) asynchronous tracking and future forecasting naturally emerge as internal representations that enable streaming perception, and (3) dynamic scheduling can be used to overcome temporal aliasing, yielding the paradoxical result that latency is sometimes minimized by sitting idle and "doing nothing".

論文の概要: Towards Streaming Perception

関連論文リスト