DRAMA: Joint Risk Localization and Captioning in Driving
- URL: http://arxiv.org/abs/2209.10767v1
- Date: Thu, 22 Sep 2022 03:53:56 GMT
- Title: DRAMA: Joint Risk Localization and Captioning in Driving
- Authors: Srikanth Malla, Chiho Choi, Isht Dwivedi, Joon Hee Choi, Jiachen Li
- Abstract summary: This paper proposes a new research direction of joint risk localization in driving scenes and its risk explanation as a natural language description.
Due to the lack of standard benchmarks, we collected a large-scale dataset, DRAMA (Driving Risk Assessment Mechanism with A captioning module)
Our dataset accommodates video- and object-level questions on driving risks with associated important objects to achieve the goal of visual captioning.
- Score: 23.091433343825727
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Considering the functionality of situational awareness in safety-critical
automation systems, the perception of risk in driving scenes and its
explainability is of particular importance for autonomous and cooperative
driving. Toward this goal, this paper proposes a new research direction of
joint risk localization in driving scenes and its risk explanation as a natural
language description. Due to the lack of standard benchmarks, we collected a
large-scale dataset, DRAMA (Driving Risk Assessment Mechanism with A captioning
module), which consists of 17,785 interactive driving scenarios collected in
Tokyo, Japan. Our DRAMA dataset accommodates video- and object-level questions
on driving risks with associated important objects to achieve the goal of
visual captioning as a free-form language description utilizing closed and
open-ended responses for multi-level questions, which can be used to evaluate a
range of visual captioning capabilities in driving scenarios. We make this data
available to the community for further research. Using DRAMA, we explore
multiple facets of joint risk localization and captioning in interactive
driving scenarios. In particular, we benchmark various multi-task prediction
architectures and provide a detailed analysis of joint risk localization and
risk captioning. The data set is available at https://usa.honda-ri.com/drama
Related papers
- Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving [0.5249805590164902]
We propose a novel context-aware motion retrieval framework to support targeted evaluation of autonomous driving systems in diverse, human-centered scenarios.<n>Our approach outperforms state-of-the-art models by up to 27.5% accuracy in motion-context retrieval, when evaluated on the WayMoCo dataset.
arXiv Detail & Related papers (2025-08-01T12:41:52Z) - Why Braking? Scenario Extraction and Reasoning Utilizing LLM [13.88343221678386]
We propose a novel framework that leverages Large Language Model (LLM) for scenario understanding and reasoning.<n>Our method bridges the gap between low-level numerical signals and natural language descriptions, enabling LLM to interpret and classify driving scenarios.
arXiv Detail & Related papers (2025-07-17T08:33:56Z) - DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving [5.362063089413001]
No existing benchmark evaluates multi-class intent prediction in safety-critical situations.<n>We introduce DRAMA-X, a fine-grained benchmark constructed from the DRAMA dataset.<n>We propose SGG-Intent, a lightweight, training-free framework that mirrors the ego vehicle's reasoning pipeline.
arXiv Detail & Related papers (2025-06-21T05:01:42Z) - DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments [60.69159598130235]
We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs)
DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.)
Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
arXiv Detail & Related papers (2024-12-28T06:13:44Z) - doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation [0.0]
doScenes is a novel dataset designed to facilitate research on human-vehicle instruction interactions.
DoScenes bridges the gap between instruction and driving response, enabling context-aware and adaptive planning.
arXiv Detail & Related papers (2024-12-08T11:16:47Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv Detail & Related papers (2024-11-25T16:38:17Z) - ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding [6.461440777667878]
We propose ScVLM, a hybrid approach that combines supervised learning and contrastive learning to improve driving video understanding and event description.
The proposed approach is trained on and evaluated by more than 8,600 SCEs from the Second Strategic Highway Research Program Naturalistic Driving Study dataset.
arXiv Detail & Related papers (2024-10-01T18:10:23Z) - CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving [1.727597257312416]
CoVLA (Comprehensive Vision-Language-Action) dataset comprises real-world driving videos spanning more than 80 hours.
This dataset establishes a framework for robust, interpretable, and data-driven autonomous driving systems.
arXiv Detail & Related papers (2024-08-19T09:53:49Z) - Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - Hawk: Learning to Understand Open-World Video Anomalies [76.9631436818573]
Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs.
We introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely.
We have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions.
arXiv Detail & Related papers (2024-05-27T07:08:58Z) - Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction [18.285227911703977]
We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams.
The problem needs predicting and reasoning about future events based on uncertain observations.
To enable research in this understudied area, a new dataset named the DHPR dataset is created.
arXiv Detail & Related papers (2023-10-07T03:16:30Z) - RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and
Comfortable Autonomous Driving [67.09546127265034]
Road surface reconstruction helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems.
We introduce the Road Surface Reconstruction dataset, a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions.
It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps.
arXiv Detail & Related papers (2023-10-03T17:59:32Z) - Language-Guided 3D Object Detection in Point Cloud for Autonomous
Driving [91.91552963872596]
We propose a new multi-modal visual grounding task, termed LiDAR Grounding.
It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector.
Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.
arXiv Detail & Related papers (2023-05-25T06:22:10Z) - Intersection Warning System for Occlusion Risks using Relational Local
Dynamic Maps [0.0]
This work addresses the task of risk evaluation in traffic scenarios with limited observability due to restricted sensorial coverage.
To identify the area of sight, we employ ray casting on a local dynamic map providing geometrical information and road infrastructure.
Resulting risk indicators are utilized to evaluate the driver's current behavior, to warn the driver in critical situations, to give suggestions on how to act safely or to plan safe trajectories.
arXiv Detail & Related papers (2023-03-13T16:01:55Z) - Vision based Pedestrian Potential Risk Analysis based on Automated
Behavior Feature Extraction for Smart and Safe City [5.759189800028578]
We propose a comprehensive analytical model for pedestrian potential risk using video footage gathered by road security cameras deployed at such crossings.
The proposed system automatically detects vehicles and pedestrians, calculates trajectories by frames, and extracts behavioral features affecting the likelihood of potentially dangerous scenes between these objects.
We validated feasibility and applicability by applying it in multiple crosswalks in Osan city, Korea.
arXiv Detail & Related papers (2021-05-06T11:03:10Z) - Generating and Characterizing Scenarios for Safety Testing of Autonomous
Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator.
We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project.
We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.