OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
- URL: http://arxiv.org/abs/2411.17761v1
- Date: Tue, 26 Nov 2024 01:50:06 GMT
- Title: OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
- Authors: Zhongyu Xia, Jishuo Li, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Ming-Hsuan Yang,
- Abstract summary: We introduce OpenAD, the first real-world open-world autonomous driving benchmark for 3D object detection.<n>OpenAD is built on a corner case discovery and annotation pipeline integrating with a multimodal large language model (MLLM)
- Score: 47.9080685468069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-world autonomous driving encompasses domain generalization and open-vocabulary. Domain generalization refers to the capabilities of autonomous driving systems across different scenarios and sensor parameter configurations. Open vocabulary pertains to the ability to recognize various semantic categories not encountered during training. In this paper, we introduce OpenAD, the first real-world open-world autonomous driving benchmark for 3D object detection. OpenAD is built on a corner case discovery and annotation pipeline integrating with a multimodal large language model (MLLM). The proposed pipeline annotates corner case objects in a unified format for five autonomous driving perception datasets with 2000 scenarios. In addition, we devise evaluation methodologies and evaluate various 2D and 3D open-world and specialized models. Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark. Annotations, toolkit code, and all evaluation codes will be released.
Related papers
- OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing [57.050679160659705]
We introduce OpenEarthSensing, a large-scale fine-grained benchmark for open-world remote sensing.
OpenEarthSensing includes 189 scene and objects, covering the vast majority of potential semantic shifts that may occur in the real world.
We conduct the baseline evaluation of current mainstream open-world tasks and methods on OpenEarthSensing.
arXiv Detail & Related papers (2025-02-28T02:49:52Z) - Open-World Panoptic Segmentation [31.799000996671975]
We propose Con2MAV, an approach for open-world panoptic segmentation.
We show that our model achieves state-of-the-art results on open-world segmentation tasks.
We also propose PANIC, a benchmark for evaluating open-world panoptic segmentation in autonomous driving scenarios.
arXiv Detail & Related papers (2024-12-17T10:03:39Z) - Open Vocabulary Monocular 3D Object Detection [10.424711580213616]
We pioneer the study of open-vocabulary monocular 3D object detection, a novel task that aims to detect and localize objects in 3D space from a single RGB image.
We introduce a class-agnostic approach that leverages open-vocabulary 2D detectors and lifts 2D bounding boxes into 3D space.
Our approach decouples the recognition and localization of objects in 2D from the task of estimating 3D bounding boxes, enabling generalization across unseen categories.
arXiv Detail & Related papers (2024-11-25T18:59:17Z) - Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking [73.05477052645885]
We introduce open-vocabulary 3D tracking, which extends the scope of 3D tracking to include objects beyond predefined categories.
We propose a novel approach that integrates open-vocabulary capabilities into a 3D tracking framework, allowing for generalization to unseen object classes.
arXiv Detail & Related papers (2024-10-02T15:48:42Z) - Open 3D World in Autonomous Driving [6.876824330759794]
This paper presents a novel approach that integrates 3D point cloud data, acquired from LIDAR sensors, with textual information.
We introduce an efficient framework for the fusion of bird's-eye view (BEV) region features with textual features.
The effectiveness of the proposed methodology is rigorously evaluated through extensive experimentation on the newly introduced NuScenes-T dataset.
arXiv Detail & Related papers (2024-08-20T14:10:44Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D
Data [42.37939270236269]
We introduce OpenAnnotate3D, an open-source open-vocabulary auto-labeling system for vision and point cloud data.
Our system integrates the chain-of-thought capabilities of Large Language Models and the cross-modality capabilities of vision-language models.
arXiv Detail & Related papers (2023-10-20T10:12:18Z) - Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving [39.70689418558153]
We present a multi-modal auto labeling pipeline capable of generating amodal 3D bounding boxes and tracklets for training models on open-set categories without 3D human labels.
Our pipeline exploits motion cues inherent in point cloud sequences in combination with the freely available 2D image-text pairs to identify and track all traffic participants.
arXiv Detail & Related papers (2023-09-25T19:33:52Z) - UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving [47.590099762244535]
Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks.
This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving.
To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, we propose UniM$2$AE.
arXiv Detail & Related papers (2023-08-21T02:13:40Z) - DuEqNet: Dual-Equivariance Network in Outdoor 3D Object Detection for
Autonomous Driving [4.489333751818157]
We propose DuEqNet, which first introduces the concept of equivariance into 3D object detection network.
The dual-equivariant of our model can extract the equivariant features at both local and global levels.
Our model presents higher accuracy on orientation and better prediction efficiency.
arXiv Detail & Related papers (2023-02-27T08:30:02Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception
Proposal [87.11988786121447]
This paper presents a framework for 3D object detection and tracking for autonomous vehicles.
The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection.
A variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack.
arXiv Detail & Related papers (2020-08-21T20:36:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.