Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- URL: http://arxiv.org/abs/2407.05910v3
- Date: Wed, 08 Jan 2025 23:40:38 GMT
- Title: Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
- Authors: Aaron Lohner, Francesco Compagno, Jonathan Francis, Alessandro Oltramari,
- Abstract summary: This work focuses on classifying traffic scenes into specific accident types.
We approach the problem by representing a traffic scene as a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges.
- Score: 45.7444555195196
- License:
- Abstract: Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may be useful to prevent it from recurring. This work focuses on classifying traffic scenes into specific accident types. We approach the problem by representing a traffic scene as a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges. This representation of a traffic scene is referred to as a scene graph, and can be used as input for an accident classifier. Better results are obtained with a classifier that fuses the scene graph input with visual and textual representations. This work introduces a multi-stage, multimodal pipeline that pre-processes videos of traffic accidents, encodes them as scene graphs, and aligns this representation with vision and language modalities before executing the classification task. When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly (DoTA) benchmark, representing an increase of close to 5 percentage points from the case where scene graph information is not taken into account.
Related papers
- Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations
for Accident Analysis [21.02297148118655]
This paper constructs a large-scale dataset of traffic accident records from official reports of various states in the US.
Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks.
Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error.
arXiv Detail & Related papers (2023-10-31T21:43:10Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations.
The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes.
We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - Self Supervised Clustering of Traffic Scenes using Graph Representations [2.658812114255374]
We present a data-driven method to cluster traffic scenes that is self-supervised, i.e. without manual labelling.
We leverage the semantic scene graph model to create a generic graph embedding of the traffic scene, which is then mapped to a low-dimensional embedding space using a Siamese network.
In the training process of our novel approach, we augment existing traffic scenes in the Cartesian space to generate positive similarity samples.
arXiv Detail & Related papers (2022-11-24T22:52:55Z) - Sensing accident-prone features in urban scenes for proactive driving
and accident prevention [0.5669790037378094]
This paper proposes a visual notification of accident-prone features to drivers based on real-time images obtained via dashcam.
Google Street View images around accident hotspots are used to train a family of deep convolutional neural networks (CNNs)
CNNs are able to detect accident-prone features and classify a given urban scene into an accident hotspot and a non-hotspot.
arXiv Detail & Related papers (2022-02-25T16:05:53Z) - Towards Traffic Scene Description: The Semantic Scene Graph [0.0]
A model to describe a traffic scene in a semantic way is described in this paper.
The model allows to describe a traffic scene independently of the road geometry and road topology.
An important aspect of the description is that it can be converted easily into a machine-readable format.
arXiv Detail & Related papers (2021-11-19T13:08:55Z) - An Image-based Approach of Task-driven Driving Scene Categorization [7.291979964739049]
This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
arXiv Detail & Related papers (2021-03-10T08:23:36Z) - Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset
for Intelligent Vehicles [72.04891523115535]
We propose road scene graph,a special scene-graph for intelligent vehicles.
It provides not only object proposals but also their pair-wise relationships.
By organizing them in a topological graph, these data are explainable, fully-connected, and could be easily processed by GCNs.
arXiv Detail & Related papers (2020-11-27T07:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.