Related papers: DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

URL: http://arxiv.org/abs/2412.20042v1
Date: Sat, 28 Dec 2024 06:13:44 GMT
Title: DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Authors: Xijun Wang, Pedro Sandoval-Segura, Chengyuan Zhang, Junyun Huang, Tianrui Guan, Ruiqi Xian, Fuxiao Liu, Rohan Chandra, Boqing Gong, Dinesh Manocha,
Abstract summary: We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs)<n>DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.)<n>Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
Score: 60.69159598130235
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. DAVE densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within DAVE are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. DAVE can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in DAVE, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. DAVE provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.

Related papers

MCRL4OR: Multimodal Contrastive Representation Learning for Off-Road Environmental Perception [28.394436093801797]
We propose a Multimodal Contrastive Representation Learning approach for Off-Road environmental perception, namely MCRL4OR. This approach aims to jointly learn three encoders for processing visual images, locomotion states, and control actions. In experiments, we pre-train the MCRL4OR with a large-scale off-road driving dataset and adopt the learned multimodal representations for various downstream perception tasks in off-road driving scenarios.
arXiv Detail & Related papers (2025-01-23T08:27:15Z)
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving [17.531603453254434]
ROAD-Waymo is an extensive dataset for the development and benchmarking of techniques for agent, action, location and event detection in road scenes. Considerably larger and more challenging than any existing dataset (and encompassing multiple cities), it comes with 198k annotated video frames, 54k agent tubes, 3.9M bounding boxes and a total of 12.4M labels.
arXiv Detail & Related papers (2024-11-03T20:46:50Z)
IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic [35.23523738296173]
We present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction.
arXiv Detail & Related papers (2024-04-12T16:00:03Z)
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments [28.23284296418962]
Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts. We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
arXiv Detail & Related papers (2024-02-29T10:03:57Z)
AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility [125.77396380698639]
AVisT is a benchmark for visual tracking in diverse scenarios with adverse visibility. AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios. We benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes.
arXiv Detail & Related papers (2022-08-14T17:49:37Z)
METEOR: A Massive Dense & Heterogeneous Behavior Dataset for Autonomous Driving [42.69638782267657]
We present a new and complex traffic dataset, METEOR, which captures traffic patterns in unstructured scenarios in India. METEOR consists of more than 1000 one-minute video clips, over 2 million annotated frames with ego-vehicle trajectories, and more than 13 million bounding boxes for surrounding vehicles or traffic agents. We use our novel dataset to evaluate the performance of object detection and behavior prediction algorithms.
arXiv Detail & Related papers (2021-09-16T01:01:55Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)
A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones. To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)
BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments [54.22535063244038]
We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments. Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians.
arXiv Detail & Related papers (2020-09-22T08:25:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.