KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding
in 2D and 3D
- URL: http://arxiv.org/abs/2109.13410v1
- Date: Tue, 28 Sep 2021 00:41:29 GMT
- Title: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding
in 2D and 3D
- Authors: Yiyi Liao, Jun Xie, Andreas Geiger
- Abstract summary: KITTI-360 is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization.
For efficient annotation, we created a tool to label 3D scenes with bounding primitives, resulting in over 150k semantic and instance annotated images and 1B annotated 3D points.
We established benchmarks and baselines for several tasks relevant to mobile perception, encompassing problems from computer vision, graphics, and robotics on the same dataset.
- Score: 67.50776195828242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the last few decades, several major subfields of artificial intelligence
including computer vision, graphics, and robotics have progressed largely
independently from each other. Recently, however, the community has realized
that progress towards robust intelligent systems such as self-driving cars
requires a concerted effort across the different fields. This motivated us to
develop KITTI-360, successor of the popular KITTI dataset. KITTI-360 is a
suburban driving dataset which comprises richer input modalities, comprehensive
semantic instance annotations and accurate localization to facilitate research
at the intersection of vision, graphics and robotics. For efficient annotation,
we created a tool to label 3D scenes with bounding primitives and developed a
model that transfers this information into the 2D image domain, resulting in
over 150k semantic and instance annotated images and 1B annotated 3D points.
Moreover, we established benchmarks and baselines for several tasks relevant to
mobile perception, encompassing problems from computer vision, graphics, and
robotics on the same dataset. KITTI-360 will enable progress at the
intersection of these research areas and thus contributing towards solving one
of our grand challenges: the development of fully autonomous self-driving
systems.
Related papers
- RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)
A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - The NEOLIX Open Dataset for Autonomous Driving [1.4091801425319965]
We present the NEOLIX dataset and its applica-tions in the autonomous driving area.
Our dataset includes about 30,000 frames with point cloud la-bels, and more than 600k 3D bounding boxes withannotations.
arXiv Detail & Related papers (2020-11-27T02:27:39Z) - Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach [10.610403488989428]
We propose a framework that estimates the 3D positions of semantically meaningful landmarks without assuming known camera intrinsics.
We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning.
We achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively.
arXiv Detail & Related papers (2020-07-25T12:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.