Real-time Stereo-based 3D Object Detection for Streaming Perception
- URL: http://arxiv.org/abs/2410.12394v1
- Date: Wed, 16 Oct 2024 09:23:02 GMT
- Title: Real-time Stereo-based 3D Object Detection for Streaming Perception
- Authors: Changcai Li, Zonghua Gu, Gang Chen, Libo Huang, Wei Zhang, Huihui Zhou,
- Abstract summary: We introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception.
StreamDSGN directly predicts the 3D properties of objects in the next moment by leveraging historical information.
Compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%.
- Score: 12.52037626475608
- License:
- Abstract: The ability to promptly respond to environmental changes is crucial for the perception system of autonomous driving. Recently, a new task called streaming perception was proposed. It jointly evaluate the latency and accuracy into a single metric for video online perception. In this work, we introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception. StreamDSGN is an end-to-end framework that directly predicts the 3D properties of objects in the next moment by leveraging historical information, thereby alleviating the accuracy degradation of streaming perception. Further, StreamDSGN applies three strategies to enhance the perception accuracy: (1) A feature-flow-based fusion method, which generates a pseudo-next feature at the current moment to address the misalignment issue between feature and ground truth. (2) An extra regression loss for explicit supervision of object motion consistency in consecutive frames. (3) A large kernel backbone with a large receptive field for effectively capturing long-range spatial contextual features caused by changes in object positions. Experiments on the KITTI Tracking dataset show that, compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%. Our code is available at https://github.com/weiyangdaren/streamDSGN-pytorch.
Related papers
- Predict to Detect: Prediction-guided 3D Object Detection using
Sequential Images [15.51093009875854]
We propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework.
P2D predicts object information in the current frame using solely past frames to learn temporal motion features.
We then introduce a novel temporal feature aggregation method that attentively exploits Bird's-Eye-View (BEV) features based on predicted object information.
arXiv Detail & Related papers (2023-06-14T14:22:56Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - 3D Object Detection and Tracking Based on Streaming Data [9.085584050311178]
We set up a dual-way network for 3D object detection based on ons, and then propagate predictions to non-key frames through a motion based algorithm guided by temporal information.
Our framework is not only shown to have significant improvements compared with frame-by-frame paradigm, but also proven to produce competitive results on KITTI Object Tracking Benchmark.
arXiv Detail & Related papers (2020-09-14T03:15:41Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z) - Streaming Object Detection for 3-D Point Clouds [29.465873948076766]
LiDAR provides a prominent sensory modality that informs many existing perceptual systems.
The latency for perceptual systems based on point cloud data can be dominated by the amount of time for a complete rotational scan.
We show how operating on LiDAR data in its native streaming formulation offers several advantages for self driving object detection.
arXiv Detail & Related papers (2020-05-04T21:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.