Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- URL: http://arxiv.org/abs/2407.05910v1
- Date: Mon, 8 Jul 2024 13:15:11 GMT
- Title: Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- Authors: Aaron Lohner, Francesco Compagno, Jonathan Francis, Alessandro Oltramari,
- Abstract summary: This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification.
When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly benchmark.
- Score: 45.7444555195196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may be useful to prevent it from reoccurring. The task of being able to classify a traffic scene as a specific type of accident is the focus of this work. We approach the problem by likening a traffic scene to a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges. This representation of an accident can be referred to as a scene graph, and is used as input for an accident classifier. Better results can be obtained with a classifier that fuses the scene graph input with representations from vision and language. This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification. When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly (DoTA) benchmark, representing an increase of close to 5 percentage points from the case where scene graph information is not taken into account.
Related papers
- Abductive Ego-View Accident Video Understanding for Safe Driving
Perception [75.60000661664556]
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
arXiv Detail & Related papers (2024-03-01T10:42:52Z) - Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations
for Accident Analysis [21.02297148118655]
This paper constructs a large-scale dataset of traffic accident records from official reports of various states in the US.
Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks.
Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error.
arXiv Detail & Related papers (2023-10-31T21:43:10Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video
Surveillance [2.1076255329439304]
Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes.
After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work.
arXiv Detail & Related papers (2022-09-26T03:00:50Z) - Sensing accident-prone features in urban scenes for proactive driving
and accident prevention [0.5669790037378094]
This paper proposes a visual notification of accident-prone features to drivers based on real-time images obtained via dashcam.
Google Street View images around accident hotspots are used to train a family of deep convolutional neural networks (CNNs)
CNNs are able to detect accident-prone features and classify a given urban scene into an accident hotspot and a non-hotspot.
arXiv Detail & Related papers (2022-02-25T16:05:53Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset
for Intelligent Vehicles [72.04891523115535]
We propose road scene graph,a special scene-graph for intelligent vehicles.
It provides not only object proposals but also their pair-wise relationships.
By organizing them in a topological graph, these data are explainable, fully-connected, and could be easily processed by GCNs.
arXiv Detail & Related papers (2020-11-27T07:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.