Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge
- URL: http://arxiv.org/abs/2108.04230v1
- Date: Tue, 27 Jul 2021 06:36:06 GMT
- Title: Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge
- Authors: Songyang Zhang and Lin Song and Songtao Liu and Zheng Ge and Zeming Li
and Xuming He and Jian Sun
- Abstract summary: We introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
Our detector is built on a newly designed YOLO model, called YOLOX.
On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
- Score: 57.647371468876116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this report, we introduce our real-time 2D object detection system for the
realistic autonomous driving scenario. Our detector is built on a newly
designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system
achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on
detection-only track/fully track, respectively. Moreover, equipped with
TensorRT, our model achieves the 30FPS inference speed with a high-resolution
input size (e.g., 1440-2304). Code and models will be available at
https://github.com/Megvii-BaseDetection/YOLOX
Related papers
- YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions [8.820126303110545]
This paper proposes two innovative deep learning models: YOLO-Vehicle and YOLO-Vehicle-Pro.
YOLO-Vehicle is an object detection model tailored specifically for autonomous driving scenarios.
YOLO-Vehicle-Pro builds upon this foundation by introducing an improved image dehazing algorithm.
arXiv Detail & Related papers (2024-10-23T10:07:13Z) - An Effective Two-stage Training Paradigm Detector for Small Dataset [13.227589864946477]
The backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique.
During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance.
With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
arXiv Detail & Related papers (2023-09-11T17:43:11Z) - Real-Time Flying Object Detection with YOLOv8 [0.0]
This paper presents a generalized model for real-time detection of flying objects.
We also present a refined model that achieves state-of-the-art results for flying object detection.
arXiv Detail & Related papers (2023-05-17T06:11:10Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Optimizing Anchor-based Detectors for Autonomous Driving Scenes [22.946814647030667]
This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in autonomous driving scenes.
Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes.
arXiv Detail & Related papers (2022-08-11T22:44:59Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - 2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D
Object Detection [26.086623067939605]
In this report, we introduce a real-time method to detect the 2D objects from images.
We leverage accelerationRT to optimize the inference time of our detection pipeline.
Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
arXiv Detail & Related papers (2021-06-16T11:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.