AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous
Driving
- URL: http://arxiv.org/abs/2309.06547v2
- Date: Mon, 11 Mar 2024 12:36:37 GMT
- Title: AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous
Driving
- Authors: Ahmed Rida Sekkat, Rohit Mohan, Oliver Sawade, Elmar Matthes, and
Abhinav Valada
- Abstract summary: We introduce Amodal SynthDrive, a synthetic multi-task multi-modal amodal perception dataset.
The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences.
Amodal SynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation.
- Score: 10.928470926399566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlike humans, who can effortlessly estimate the entirety of objects even
when partially occluded, modern computer vision algorithms still find this
aspect extremely challenging. Leveraging this amodal perception for autonomous
driving remains largely untapped due to the lack of suitable datasets. The
curation of these datasets is primarily hindered by significant annotation
costs and mitigating annotator subjectivity in accurately labeling occluded
regions. To address these limitations, we introduce AmodalSynthDrive, a
synthetic multi-task multi-modal amodal perception dataset. The dataset
provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry
for 150 driving sequences with over 1M object annotations in diverse traffic,
weather, and lighting conditions. AmodalSynthDrive supports multiple amodal
scene understanding tasks including the introduced amodal depth estimation for
enhanced spatial understanding. We evaluate several baselines for each of these
tasks to illustrate the challenges and set up public benchmarking servers. The
dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Amodal Ground Truth and Completion in the Wild [84.54972153436466]
We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
arXiv Detail & Related papers (2023-12-28T18:59:41Z) - TAO-Amodal: A Benchmark for Tracking Any Object Amodally [41.5396827282691]
We introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences.
Our dataset includes textitamodal and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame.
arXiv Detail & Related papers (2023-12-19T18:58:40Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving
with Long-Range Perception [0.0]
This dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view.
The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain.
We trained unimodal and multimodal baseline models for 3D object detection.
arXiv Detail & Related papers (2022-11-17T10:19:59Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain
Adaptation [152.60469768559878]
SHIFT is the largest multi-task synthetic dataset for autonomous driving.
It presents discrete and continuous shifts in cloudiness, rain and fog intensity, time of day, and vehicle and pedestrian density.
Our dataset and benchmark toolkit are publicly available at www.vis.xyz/shift.
arXiv Detail & Related papers (2022-06-16T17:59:52Z) - Amodal Cityscapes: A New Dataset, its Generation, and an Amodal Semantic
Segmentation Challenge Baseline [38.8592627329447]
We consider the task of amodal semantic segmentation and propose a generic way to generate datasets to train amodal semantic segmentation methods.
We use this approach to generate an amodal Cityscapes dataset, showing its applicability for amodal semantic segmentation in automotive environment perception.
arXiv Detail & Related papers (2022-06-01T14:38:33Z) - AutoLay: Benchmarking amodal layout estimation for autonomous driving [18.152206533685412]
AutoLay is a dataset and benchmark for amodal layout estimation from monocular images.
In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D point clouds.
arXiv Detail & Related papers (2021-08-20T08:21:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.