Towards Streaming Perception
- URL: http://arxiv.org/abs/2005.10420v2
- Date: Tue, 25 Aug 2020 01:16:43 GMT
- Title: Towards Streaming Perception
- Authors: Mengtian Li, Yu-Xiong Wang, Deva Ramanan
- Abstract summary: We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
- Score: 70.68520310095155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied perception refers to the ability of an autonomous agent to perceive
its environment so that it can (re)act. The responsiveness of the agent is
largely governed by latency of its processing pipeline. While past work has
studied the algorithmic trade-off between latency and accuracy, there has not
been a clear metric to compare different methods along the Pareto optimal
latency-accuracy curve. We point out a discrepancy between standard offline
evaluation and real-time applications: by the time an algorithm finishes
processing a particular frame, the surrounding world has changed. To these
ends, we present an approach that coherently integrates latency and accuracy
into a single metric for real-time online perception, which we refer to as
"streaming accuracy". The key insight behind this metric is to jointly evaluate
the output of the entire perception stack at every time instant, forcing the
stack to consider the amount of streaming data that should be ignored while
computation is occurring. More broadly, building upon this metric, we introduce
a meta-benchmark that systematically converts any single-frame task into a
streaming perception task. We focus on the illustrative tasks of object
detection and instance segmentation in urban video streams, and contribute a
novel dataset with high-quality and temporally-dense annotations. Our proposed
solutions and their empirical analysis demonstrate a number of surprising
conclusions: (1) there exists an optimal "sweet spot" that maximizes streaming
accuracy along the Pareto optimal latency-accuracy curve, (2) asynchronous
tracking and future forecasting naturally emerge as internal representations
that enable streaming perception, and (3) dynamic scheduling can be used to
overcome temporal aliasing, yielding the paradoxical result that latency is
sometimes minimized by sitting idle and "doing nothing".
Related papers
- Real-time Stereo-based 3D Object Detection for Streaming Perception [12.52037626475608]
We introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception.
StreamDSGN directly predicts the 3D properties of objects in the next moment by leveraging historical information.
Compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%.
arXiv Detail & Related papers (2024-10-16T09:23:02Z) - Streaming Motion Forecasting for Autonomous Driving [71.7468645504988]
We introduce a benchmark that queries future trajectories on streaming data and we refer to it as "streaming forecasting"
Our benchmark inherently captures the disappearance and re-appearance of agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks.
We propose a plug-and-play meta-algorithm called "Predictive Streamer" that can adapt any snapshot-based forecaster into a streaming forecaster.
arXiv Detail & Related papers (2023-10-02T17:13:16Z) - Context-Aware Streaming Perception in Dynamic Environments [25.029862642968457]
Real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish.
We propose to maximize streaming accuracy for every environment context.
Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach.
arXiv Detail & Related papers (2022-08-16T00:33:04Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally
Consistent Video Semantic Segmentation [81.87943324048756]
In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy.
Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency.
This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
arXiv Detail & Related papers (2021-10-24T07:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.