Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- URL: http://arxiv.org/abs/2407.05910v1
- Date: Mon, 8 Jul 2024 13:15:11 GMT
- Title: Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- Authors: Aaron Lohner, Francesco Compagno, Jonathan Francis, Alessandro Oltramari,
- Abstract summary: This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification.
When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly benchmark.
- Score: 45.7444555195196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may be useful to prevent it from reoccurring. The task of being able to classify a traffic scene as a specific type of accident is the focus of this work. We approach the problem by likening a traffic scene to a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges. This representation of an accident can be referred to as a scene graph, and is used as input for an accident classifier. Better results can be obtained with a classifier that fuses the scene graph input with representations from vision and language. This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification. When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly (DoTA) benchmark, representing an increase of close to 5 percentage points from the case where scene graph information is not taken into account.
Related papers
- Abductive Ego-View Accident Video Understanding for Safe Driving
Perception [75.60000661664556]
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
arXiv Detail & Related papers (2024-03-01T10:42:52Z) - Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations
for Accident Analysis [21.02297148118655]
This paper constructs a large-scale dataset of traffic accident records from official reports of various states in the US.
Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks.
Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error.
arXiv Detail & Related papers (2023-10-31T21:43:10Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations.
The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes.
We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z) - Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - Self Supervised Clustering of Traffic Scenes using Graph Representations [2.658812114255374]
We present a data-driven method to cluster traffic scenes that is self-supervised, i.e. without manual labelling.
We leverage the semantic scene graph model to create a generic graph embedding of the traffic scene, which is then mapped to a low-dimensional embedding space using a Siamese network.
In the training process of our novel approach, we augment existing traffic scenes in the Cartesian space to generate positive similarity samples.
arXiv Detail & Related papers (2022-11-24T22:52:55Z) - TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video
Surveillance [2.1076255329439304]
Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes.
After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work.
arXiv Detail & Related papers (2022-09-26T03:00:50Z) - Sensing accident-prone features in urban scenes for proactive driving
and accident prevention [0.5669790037378094]
This paper proposes a visual notification of accident-prone features to drivers based on real-time images obtained via dashcam.
Google Street View images around accident hotspots are used to train a family of deep convolutional neural networks (CNNs)
CNNs are able to detect accident-prone features and classify a given urban scene into an accident hotspot and a non-hotspot.
arXiv Detail & Related papers (2022-02-25T16:05:53Z) - Towards Traffic Scene Description: The Semantic Scene Graph [0.0]
A model to describe a traffic scene in a semantic way is described in this paper.
The model allows to describe a traffic scene independently of the road geometry and road topology.
An important aspect of the description is that it can be converted easily into a machine-readable format.
arXiv Detail & Related papers (2021-11-19T13:08:55Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset
for Intelligent Vehicles [72.04891523115535]
We propose road scene graph,a special scene-graph for intelligent vehicles.
It provides not only object proposals but also their pair-wise relationships.
By organizing them in a topological graph, these data are explainable, fully-connected, and could be easily processed by GCNs.
arXiv Detail & Related papers (2020-11-27T07:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.