SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving
- URL: http://arxiv.org/abs/2106.11118v2
- Date: Tue, 22 Jun 2021 01:27:44 GMT
- Title: SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving
- Authors: Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Chaoqiang
Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu
- Abstract summary: We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
- Score: 94.11868795445798
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Aiming at facilitating a real-world, ever-evolving and scalable autonomous
driving system, we present a large-scale benchmark for standardizing the
evaluation of different self-supervised and semi-supervised approaches by
learning from raw data, which is the first and largest benchmark to date.
Existing autonomous driving systems heavily rely on `perfect' visual perception
models (e.g., detection) trained using extensive annotated data to ensure the
safety. However, it is unrealistic to elaborately label instances of all
scenarios and circumstances (e.g., night, extreme weather, cities) when
deploying a robust autonomous driving system. Motivated by recent powerful
advances of self-supervised and semi-supervised learning, a promising direction
is to learn a robust detection model by collaboratively exploiting large-scale
unlabeled data and few labeled data. Existing dataset (e.g., KITTI, Waymo)
either provides only a small amount of data or covers limited domains with full
annotation, hindering the exploration of large-scale pre-trained models. Here,
we release a Large-Scale Object Detection benchmark for Autonomous driving,
named as SODA10M, containing 10 million unlabeled images and 20K images labeled
with 6 representative object categories. To improve diversity, the images are
collected every ten seconds per frame within 32 different cities under
different weather conditions, periods and location scenes. We provide extensive
experiments and deep analyses of existing supervised state-of-the-art detection
models, popular self-supervised and semi-supervised approaches, and some
insights about how to develop future models. The data and more up-to-date
information have been released at https://soda-2d.github.io.
Related papers
- Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings [5.306938463648908]
We introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings.
It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement.
We propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds.
arXiv Detail & Related papers (2024-09-16T13:10:58Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Traffic Context Aware Data Augmentation for Rare Object Detection in
Autonomous Driving [5.037913689432052]
We propose a systematic study on simple Copy-Paste data augmentation for rare object detection in autonomous driving.
Specifically, local adaptive instance-level image transformation is introduced to generate realistic rare object masks.
We build a new dataset named NM10k consisting 10k training images, 4k validation images and the corresponding labels.
arXiv Detail & Related papers (2022-05-01T01:45:00Z) - CODA: A Real-World Road Corner Case Dataset for Object Detection in
Autonomous Driving [117.87070488537334]
We introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors.
The performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR.
We experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA.
arXiv Detail & Related papers (2022-03-15T08:32:56Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.