MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
- URL: http://arxiv.org/abs/2401.12761v4
- Date: Mon, 30 Sep 2024 12:24:24 GMT
- Title: MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
- Authors: Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool,
- Abstract summary: We introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty.
The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor.
MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions.
- Score: 46.369657697892634
- License:
- Abstract: Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.
Related papers
- Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation.
We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs.
We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z) - Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.
Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities.
We set the new state of the art with CAFuser on the MUSES dataset with 59.7 PQ for multimodal panoptic segmentation and 78.2 mIoU for semantic segmentation, ranking first on the public benchmarks.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions [10.306226508237348]
The SemanticSpray++ dataset provides labels for camera, LiDAR, and radar data of highway-like scenarios in wet surface conditions.
By labeling all three sensor modalities, the dataset offers a comprehensive test bed for analyzing the performance of different perception methods.
arXiv Detail & Related papers (2024-06-14T11:46:48Z) - SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous
Driving [41.221988979184665]
SUPS is a simulated dataset for underground automatic parking.
It supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images.
We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset.
arXiv Detail & Related papers (2023-02-25T02:59:12Z) - SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain
Adaptation [152.60469768559878]
SHIFT is the largest multi-task synthetic dataset for autonomous driving.
It presents discrete and continuous shifts in cloudiness, rain and fog intensity, time of day, and vehicle and pedestrian density.
Our dataset and benchmark toolkit are publicly available at www.vis.xyz/shift.
arXiv Detail & Related papers (2022-06-16T17:59:52Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z) - DSEC: A Stereo Event Camera Dataset for Driving Scenarios [55.79329250951028]
This work presents the first high-resolution, large-scale stereo dataset with event cameras.
The dataset contains 53 sequences collected by driving in a variety of illumination conditions.
It provides ground truth disparity for the development and evaluation of event-based stereo algorithms.
arXiv Detail & Related papers (2021-03-10T12:10:33Z) - Camera-Lidar Integration: Probabilistic sensor fusion for semantic
mapping [8.18198392834469]
An automated vehicle must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment.
We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and label probabilities for the semantic images.
arXiv Detail & Related papers (2020-07-09T07:59:39Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.