IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic
- URL: http://arxiv.org/abs/2404.08561v2
- Date: Tue, 23 Apr 2024 19:19:35 GMT
- Title: IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic
- Authors: Chirag Parikh, Rohit Saluja, C. V. Jawahar, Ravi Kiran Sarvadevabhatla,
- Abstract summary: We present IDD-X, a large-scale dual-view driving video dataset.
With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects.
We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction.
- Score: 35.23523738296173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.
Related papers
- ROAD-Waymo: Action Awareness at Scale for Autonomous Driving [17.531603453254434]
ROAD-Waymo is an extensive dataset for the development and benchmarking of techniques for agent, action, location and event detection in road scenes.
Considerably larger and more challenging than any existing dataset (and encompassing multiple cities), it comes with 198k annotated video frames, 54k agent tubes, 3.9M bounding boxes and a total of 12.4M labels.
arXiv Detail & Related papers (2024-11-03T20:46:50Z) - RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving [6.372000468173298]
RSUD20K is a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads.
Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity.
arXiv Detail & Related papers (2024-01-14T16:10:42Z) - Deep Perspective Transformation Based Vehicle Localization on Bird's Eye
View [0.49747156441456597]
Traditional approaches rely on installing multiple sensors to simulate the environment.
We propose an alternative solution by generating a top-down representation of the scene.
We present an architecture that transforms perspective view RGB images into bird's-eye-view maps with segmented surrounding vehicles.
arXiv Detail & Related papers (2023-11-12T10:16:42Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes [79.18349050238413]
Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios.
An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models.
We build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames.
arXiv Detail & Related papers (2022-10-23T23:03:17Z) - RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent
Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals.
The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z) - Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep
Generative Approach with Attention [9.442285577226606]
We propose a conditional generative model for interaction detection at intersections.
It aims to automatically analyze massive video data about the continuity of road users' behavior.
The model's efficacy was validated by testing on real-world datasets.
arXiv Detail & Related papers (2021-05-09T10:03:55Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z) - Towards Accurate Vehicle Behaviour Classification With Multi-Relational
Graph Convolutional Networks [22.022759283770377]
We propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video.
A temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours.
The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse.
arXiv Detail & Related papers (2020-02-03T14:34:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.