InfraParis: A multi-modal and multi-task autonomous driving dataset
- URL: http://arxiv.org/abs/2309.15751v2
- Date: Mon, 6 Nov 2023 10:52:37 GMT
- Title: InfraParis: A multi-modal and multi-task autonomous driving dataset
- Authors: Gianni Franchi, Marwane Hariat, Xuanlong Yu, Nacim Belkhir, Antoine
Manzanera and David Filliat
- Abstract summary: We introduce a novel dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared.
We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
- Score: 4.6740600790529365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current deep neural networks (DNNs) for autonomous driving computer vision
are typically trained on specific datasets that only involve a single type of
data and urban scenes. Consequently, these models struggle to handle new
objects, noise, nighttime conditions, and diverse scenarios, which is essential
for safety-critical applications. Despite ongoing efforts to enhance the
resilience of computer vision DNNs, progress has been sluggish, partly due to
the absence of benchmarks featuring multiple modalities. We introduce a novel
and versatile dataset named InfraParis that supports multiple tasks across
three modalities: RGB, depth, and infrared. We assess various state-of-the-art
baseline techniques, encompassing models for the tasks of semantic
segmentation, object detection, and depth estimation. More visualizations and
the download link for InfraParis are available at
\href{https://ensta-u2is.github.io/infraParis/}{https://ensta-u2is.github.io/infraParis/}.
Related papers
- SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception [22.114089372056238]
We present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset.
SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects.
We evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks.
arXiv Detail & Related papers (2024-04-12T20:40:12Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes [79.18349050238413]
Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios.
An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models.
We build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames.
arXiv Detail & Related papers (2022-10-23T23:03:17Z) - DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and
Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving.
The lack of datasets has severely blocked the development of collaborative perception algorithms.
We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z) - A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding [20.878931360708343]
We show that it is possible to perform all perception tasks via a simple and efficient multi-task network.
Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding.
arXiv Detail & Related papers (2021-03-06T08:00:26Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.