Spatio-Temporal Scene-Graph Embedding for Autonomous Vehicle Collision
Prediction
- URL: http://arxiv.org/abs/2111.06123v1
- Date: Thu, 11 Nov 2021 10:01:01 GMT
- Title: Spatio-Temporal Scene-Graph Embedding for Autonomous Vehicle Collision
Prediction
- Authors: Arnav V. Malawade, Shih-Yuan Yu, Brandon Hsu, Deepan Muthirayan,
Pramod P. Khargonekar, Mohammad A. Al Faruque
- Abstract summary: We show that sg2vec predicts collisions 8.11% more accurately than the state-of-the-art method on synthesized datasets.
We also show that sg2vec is better than the state-of-the-art at transferring knowledge from synthetic datasets to real-world driving datasets.
- Score: 0.3738410998183615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous vehicles (AVs), early warning systems rely on collision
prediction to ensure occupant safety. However, state-of-the-art methods using
deep convolutional networks either fail at modeling collisions or are too
expensive/slow, making them less suitable for deployment on AV edge hardware.
To address these limitations, we propose sg2vec, a spatio-temporal scene-graph
embedding methodology that uses Graph Neural Network (GNN) and Long Short-Term
Memory (LSTM) layers to predict future collisions via visual scene perception.
We demonstrate that sg2vec predicts collisions 8.11% more accurately and 39.07%
earlier than the state-of-the-art method on synthesized datasets, and 29.47%
more accurately on a challenging real-world collision dataset. We also show
that sg2vec is better than the state-of-the-art at transferring knowledge from
synthetic datasets to real-world driving datasets. Finally, we demonstrate that
sg2vec performs inference 9.3x faster with an 88.0% smaller model, 32.4% less
power, and 92.8% less energy than the state-of-the-art method on the
industry-standard Nvidia DRIVE PX 2 platform, making it more suitable for
implementation on the edge.
Related papers
- ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera.
Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z) - Real-Time Pedestrian Detection on IoT Edge Devices: A Lightweight Deep Learning Approach [1.4732811715354455]
This research explores implementing a lightweight deep learning model on Artificial Intelligence of Things (AIoT) edge devices.
An optimized You Only Look Once (YOLO) based DL model is deployed for real-time pedestrian detection.
The simulation results demonstrate that the optimized YOLO model can achieve real-time pedestrian detection, with a fast inference speed of 147 milliseconds, a frame rate of 2.3 frames per second, and an accuracy of 78%.
arXiv Detail & Related papers (2024-09-24T04:48:41Z) - Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting.
We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them.
We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z) - CabiNet: Scaling Neural Collision Detection for Object Rearrangement
with Procedural Scene Generation [54.68738348071891]
We first generate over 650K cluttered scenes - orders of magnitude more than prior work - in diverse everyday environments.
We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture.
CabiNet is a collision model that accepts object and scene point clouds, captured from a single-view depth observation.
arXiv Detail & Related papers (2023-04-18T21:09:55Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - Is attention to bounding boxes all you need for pedestrian action
prediction? [1.3999481573773074]
We present a framework based on multiple variations of the Transformer models to reason attentively about the dynamic evolution of the pedestrians' past trajectory.
We prove that using only bounding boxes as input to our model can outperform the previous state-of-the-art models.
Our model has similarly reached high accuracy (91 and F1-score (0.91) on this dataset.
arXiv Detail & Related papers (2021-07-16T17:47:32Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - Deep Dual-resolution Networks for Real-time and Accurate Semantic
Segmentation of Road Scenes [0.23090185577016442]
We propose novel deep dual-resolution networks ( DDRNets) for real-time semantic segmentation of road scenes.
Our method achieves new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset.
arXiv Detail & Related papers (2021-01-15T12:56:18Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - Res-GCNN: A Lightweight Residual Graph Convolutional Neural Networks for
Human Trajectory Forecasting [0.0]
We propose a Residual Graph Convolutional Neural Network (Res-GCNN), which models the interactive behaviors of pedes-trians.
Results show an improvement over the state of art by 13.3% on the Final Displacement Error (FDE) which reaches 0.65 meter.
The code will be made publicly available on GitHub.
arXiv Detail & Related papers (2020-11-18T11:18:16Z) - Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle
Decisions [1.4086978333609153]
We propose a novel data-driven approach that uses scene-graphs as intermediate representations.
Our approach includes a Multi-Relation Graph Convolution Network, a Long-Short Term Memory Network, and attention layers for modeling the subjective risk of driving maneuvers.
We show that our approach achieves a higher classification accuracy than the state-of-the-art approach on both large (96.4% vs. 91.2%) and small (91.8% vs. 71.2%)
We also show that our model trained on a synthesized dataset achieves an average accuracy of 87.8% when tested on a real-world dataset.
arXiv Detail & Related papers (2020-08-31T07:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.