StreamYOLO: Real-time Object Detection for Streaming Perception
- URL: http://arxiv.org/abs/2207.10433v1
- Date: Thu, 21 Jul 2022 12:03:02 GMT
- Title: StreamYOLO: Real-time Object Detection for Streaming Perception
- Authors: Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
- Abstract summary: We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
- Score: 84.2559631820007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The perceptive models of autonomous driving require fast inference within a
low latency for safety. While existing works ignore the inevitable
environmental changes after processing, streaming perception jointly evaluates
the latency and accuracy into a single metric for video online perception,
guiding the previous works to search trade-offs between accuracy and speed. In
this paper, we explore the performance of real time models on this metric and
endow the models with the capacity of predicting the future, significantly
improving the results for streaming perception. Specifically, we build a simple
framework with two effective modules. One is a Dual Flow Perception module
(DFP). It consists of dynamic flow and static flow in parallel to capture
moving tendency and basic detection feature, respectively. Trend Aware Loss
(TAL) is the other module which adaptively generates loss weight for each
object with its moving speed. Realistically, we consider multiple velocities
driving scene and further propose Velocity-awared streaming AP (VsAP) to
jointly evaluate the accuracy. In this realistic setting, we design a efficient
mix-velocity training strategy to guide detector perceive any velocities. Our
simple method achieves the state-of-the-art performance on Argoverse-HD dataset
and improves the sAP and VsAP by 4.7% and 8.2% respectively compared to the
strong baseline, validating its effectiveness.
Related papers
- MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation [3.9974562667271507]
Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments.
Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training.
We propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment.
arXiv Detail & Related papers (2024-11-07T15:36:49Z) - Real-time Stereo-based 3D Object Detection for Streaming Perception [12.52037626475608]
We introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception.
StreamDSGN directly predicts the 3D properties of objects in the next moment by leveraging historical information.
Compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%.
arXiv Detail & Related papers (2024-10-16T09:23:02Z) - Event-Aided Time-to-Collision Estimation for Autonomous Driving [28.13397992839372]
We present a novel method that estimates the time to collision using a neuromorphic event-based camera.
The proposed algorithm consists of a two-step approach for efficient and accurate geometric model fitting on event data.
Experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-07-10T02:37:36Z) - SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for
Autonomous Driving [27.776472262857045]
This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles.
We propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner.
As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2024-02-04T15:07:49Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Real Time Monocular Vehicle Velocity Estimation using Synthetic Data [78.85123603488664]
We look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car.
We propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity.
arXiv Detail & Related papers (2021-09-16T13:10:27Z) - PAN: Towards Fast Action Recognition via Learning Persistence of
Appearance [60.75488333935592]
Most state-of-the-art methods heavily rely on dense optical flow as motion representation.
In this paper, we shed light on fast action recognition by lifting the reliance on optical flow.
We design a novel motion cue called Persistence of Appearance (PA)
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
arXiv Detail & Related papers (2020-08-08T07:09:54Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.