An Attention-guided Multistream Feature Fusion Network for Localization
of Risky Objects in Driving Videos
- URL: http://arxiv.org/abs/2209.07922v1
- Date: Fri, 16 Sep 2022 13:36:28 GMT
- Title: An Attention-guided Multistream Feature Fusion Network for Localization
of Risky Objects in Driving Videos
- Authors: Muhammad Monjurul Karim, Ruwen Qin, Zhaozheng Yin
- Abstract summary: This paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos.
Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capturetemporal cues for distinguishing dangerous traffic agents.
Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video.
- Score: 10.674638266121574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting dangerous traffic agents in videos captured by vehicle-mounted
dashboard cameras (dashcams) is essential to facilitate safe navigation in a
complex environment. Accident-related videos are just a minor portion of the
driving video big data, and the transient pre-accident processes are highly
dynamic and complex. Besides, risky and non-risky traffic agents can be similar
in their appearance. These make risky object localization in the driving video
particularly challenging. To this end, this paper proposes an attention-guided
multistream feature fusion network (AM-Net) to localize dangerous traffic
agents from dashcam videos. Two Gated Recurrent Unit (GRU) networks use object
bounding box and optical flow features extracted from consecutive video frames
to capture spatio-temporal cues for distinguishing dangerous traffic agents. An
attention module coupled with the GRUs learns to attend to the traffic agents
relevant to an accident. Fusing the two streams of features, AM-Net predicts
the riskiness scores of traffic agents in the video. In supporting this study,
the paper also introduces a benchmark dataset called Risky Object Localization
(ROL). The dataset contains spatial, temporal, and categorical annotations with
the accident, object, and scene-level attributes. The proposed AM-Net achieves
a promising performance of 85.73% AUC on the ROL dataset. Meanwhile, the AM-Net
outperforms current state-of-the-art for video anomaly detection by 6.3% AUC on
the DoTA dataset. A thorough ablation study further reveals AM-Net's merits by
evaluating the contributions of its different components.
Related papers
- CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions [13.981748780317329]
Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs)
This study introduces a novel accident anticipation framework for AVs, termed CRASH.
It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion.
Our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-25T04:12:49Z) - Abductive Ego-View Accident Video Understanding for Safe Driving
Perception [75.60000661664556]
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
arXiv Detail & Related papers (2024-03-01T10:42:52Z) - A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts,
Datasets and Metrics [77.34726150561087]
This work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles.
Concepts and characteristics related to both sensors, as well as to their fusion, are presented.
We give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.
arXiv Detail & Related papers (2023-03-08T00:48:32Z) - TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video
Surveillance [2.1076255329439304]
Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes.
After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work.
arXiv Detail & Related papers (2022-09-26T03:00:50Z) - Real-Time Accident Detection in Traffic Surveillance Using Deep Learning [0.8808993671472349]
This paper presents a new efficient framework for accident detection at intersections for traffic surveillance applications.
The proposed framework consists of three hierarchical steps, including efficient and accurate object detection based on the state-of-the-art YOLOv4 method.
The robustness of the proposed framework is evaluated using video sequences collected from YouTube with diverse illumination conditions.
arXiv Detail & Related papers (2022-08-12T19:07:20Z) - Decoder Fusion RNN: Context and Interaction Aware Decoders for
Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting.
Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder.
We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z) - A Dynamic Spatial-temporal Attention Network for Early Anticipation of
Traffic Accidents [12.881094474374231]
This paper presents a dynamic spatial-temporal attention (DSTA) network for early anticipation of traffic accidents from dashcam videos.
It learns to select discriminative temporal segments of a video sequence with a module named Dynamic Temporal Attention (DTA)
The spatial-temporal relational features of accidents, along with scene appearance features, are learned jointly with a Gated Recurrent Unit (GRU) network.
arXiv Detail & Related papers (2021-06-18T15:58:53Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.