DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
- URL: http://arxiv.org/abs/2303.17144v3
- Date: Sat, 20 May 2023 21:37:07 GMT
- Title: DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
- Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen,
Bin Luo, Yifeng Geng, Xuansong Xie
- Abstract summary: We present DAMO-StreamNet, an optimized framework for streaming perception.
The framework combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms.
Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data.
- Score: 27.14089002387224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time perception, or streaming perception, is a crucial aspect of
autonomous driving that has yet to be thoroughly explored in existing research.
To address this gap, we present DAMO-StreamNet, an optimized framework that
combines recent advances from the YOLO series with a comprehensive analysis of
spatial and temporal perception mechanisms, delivering a cutting-edge solution.
The key innovations of DAMO-StreamNet are (1) A robust neck structure
incorporating deformable convolution, enhancing the receptive field and feature
alignment capabilities (2) A dual-branch structure that integrates short-path
semantic features and long-path temporal features, improving motion state
prediction accuracy. (3) Logits-level distillation for efficient optimization,
aligning the logits of teacher and student networks in semantic space. (4) A
real-time forecasting mechanism that updates support frame features with the
current frame, ensuring seamless streaming perception during inference. Our
experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art
methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200,
1920)) sAP without using extra data. This work not only sets a new benchmark
for real-time perception but also provides valuable insights for future
research. Additionally, DAMO-StreamNet can be applied to various autonomous
systems, such as drones and robots, paving the way for real-time perception.
The code is at https://github.com/zhiqic/DAMO-StreamNet.
Related papers
- Real-time Stereo-based 3D Object Detection for Streaming Perception [12.52037626475608]
We introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception.
StreamDSGN directly predicts the 3D properties of objects in the next moment by leveraging historical information.
Compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%.
arXiv Detail & Related papers (2024-10-16T09:23:02Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP
Benchmark [23.872360763782037]
ASAP is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.
We propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images.
In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints.
arXiv Detail & Related papers (2022-12-17T16:32:15Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Real-time Streaming Perception System for Autonomous Driving [2.6058660721533187]
We present the real-time steaming perception system, which is also the 2nd Place solution of Streaming Perception Challenge.
Unlike traditional object detection challenges, which focus mainly on the absolute performance, streaming perception task requires achieving a balance of accuracy and latency.
On the Argoverse-HD test set, our method achieves 33.2 streaming AP (34.6 streaming AP verified by the organizer) under the required hardware.
arXiv Detail & Related papers (2021-07-30T01:32:44Z) - ACDnet: An action detection network for real-time edge computing based
on flow-guided feature approximation and memory aggregation [8.013823319651395]
ACDnet is a compact action detection network targeting real-time edge computing.
It exploits the temporal coherence between successive video frames to approximate CNN features rather than naively extracting them.
It can robustly achieve detection well above real-time (75 FPS)
arXiv Detail & Related papers (2021-02-26T14:06:31Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.