AccidentBlip: Agent of Accident Warning based on MA-former
- URL: http://arxiv.org/abs/2404.12149v5
- Date: Tue, 28 Jan 2025 02:33:40 GMT
- Title: AccidentBlip: Agent of Accident Warning based on MA-former
- Authors: Yihua Shao, Yeling Xu, Xinwei Long, Siyu Chen, Ziyang Yan, Yang Yang, Haoting Liu, Yan Wang, Hao Tang, Zhen Lei,
- Abstract summary: AccidentBlip is a vision-only framework that employs our self-designed Motion Accident Transformer (MA-former) to process each frame of video.<n> AccidentBlip achieves performance in both accident detection and prediction tasks on the DeepAccident dataset.<n>It also outperforms current SOTA methods in V2V and V2X scenarios, demonstrating a superior capability to understand complex real-world environments.
- Score: 24.81148840857782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In complex transportation systems, accurately sensing the surrounding environment and predicting the risk of potential accidents is crucial. Most existing accident prediction methods are based on temporal neural networks, such as RNN and LSTM. Recent multimodal fusion approaches improve vehicle localization through 3D target detection and assess potential risks by calculating inter-vehicle distances. However, these temporal networks and multimodal fusion methods suffer from limited detection robustness and high economic costs. To address these challenges, we propose AccidentBlip, a vision-only framework that employs our self-designed Motion Accident Transformer (MA-former) to process each frame of video. Unlike conventional self-attention mechanisms, MA-former replaces Q-former's self-attention with temporal attention, allowing the query corresponding to the previous frame to generate the query input for the next frame. Additionally, we introduce a residual module connection between queries of consecutive frames to enhance the model's temporal processing capabilities. For complex V2V and V2X scenarios, AccidentBlip adapts by concatenating queries from multiple cameras, effectively capturing spatial and temporal relationships. In particular, AccidentBlip achieves SOTA performance in both accident detection and prediction tasks on the DeepAccident dataset. It also outperforms current SOTA methods in V2V and V2X scenarios, demonstrating a superior capability to understand complex real-world environments.
Related papers
- DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.
It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.
It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - AVD2: Accident Video Diffusion for Accident Video Description [11.221276595088215]
We introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding.
The framework generates accident videos that align with detailed natural language descriptions and reasoning, resulting in the EMM-AU dataset.
Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations.
arXiv Detail & Related papers (2025-02-20T18:22:44Z) - Enhancing In-vehicle Multiple Object Tracking Systems with Embeddable Ising Machines [0.10485739694839666]
We show an in-vehicle multiple object tracking system with a flexible assignment function.
The system relies on an embeddable Ising machine based on a quantum-inspired algorithm called simulated bifurcation.
Using a vehicle-mountable computing platform, we demonstrate a realtime system-wide throughput (23 frames per second on average) with the enhanced functionality.
arXiv Detail & Related papers (2024-10-18T00:18:27Z) - CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions [13.981748780317329]
Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs)
This study introduces a novel accident anticipation framework for AVs, termed CRASH.
It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion.
Our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-25T04:12:49Z) - When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models [14.090582912396467]
This study introduces a novel framework that integrates Large Language Models (LLMs) to enhance predictive capabilities across multiple dimensions.
We develop an innovative chain-based attention mechanism that dynamically adjusts to prioritize high-risk elements within complex driving scenes.
Empirical validation on the DAD, CCD, and A3D datasets demonstrates superior performance in Average Precision (AP) and Mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-23T08:29:49Z) - Edge-Assisted ML-Aided Uncertainty-Aware Vehicle Collision Avoidance at Urban Intersections [12.812518632907771]
We present a novel framework that detects preemptively collisions at urban crossroads.
We exploit the Multi-access Edge Computing platform of 5G networks.
arXiv Detail & Related papers (2024-04-22T18:45:40Z) - Scalable Multi-modal Model Predictive Control via Duality-based Interaction Predictions [8.256630421682951]
RAID-Net is a novel attention-based Recurrent Neural Network that predicts relevant interactions along the Model Predictive Control (MPC) prediction horizon.
Our approach is demonstrated in a simulated traffic horizon with interactive surrounding vehicles, showcasing a 12x speed-up in solving the motion planning problem.
arXiv Detail & Related papers (2024-02-02T03:19:54Z) - Exploring Highly Quantised Neural Networks for Intrusion Detection in
Automotive CAN [13.581341206178525]
Machine learning-based intrusion detection models have been shown to successfully detect multiple targeted attack vectors.
In this paper, we present a case for custom-quantised literature (CQMLP) as a multi-class classification model.
We show that the 2-bit CQMLP model, when integrated as the IDS, can detect malicious attack messages with a very high accuracy of 99.9%.
arXiv Detail & Related papers (2024-01-19T21:11:02Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z) - Learned Risk Metric Maps for Kinodynamic Systems [54.49871675894546]
We present Learned Risk Metric Maps for real-time estimation of coherent risk metrics of high dimensional dynamical systems.
LRMM models are simple to design and train, requiring only procedural generation of obstacle sets, state and control sampling, and supervised training of a function approximator.
arXiv Detail & Related papers (2023-02-28T17:51:43Z) - Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z) - MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form
Video Question Answering [73.61182342844639]
We introduce a new model named Multi-modal Iterative Spatial-temporal Transformer (MIST) to better adapt pre-trained models for long-form VideoQA.
MIST decomposes traditional dense spatial-temporal self-attention into cascaded segment and region selection modules.
Visual concepts at different granularities are then processed efficiently through an attention module.
arXiv Detail & Related papers (2022-12-19T15:05:40Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - Multi-Modal Few-Shot Temporal Action Detection [157.96194484236483]
Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection to new classes.
We introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD.
arXiv Detail & Related papers (2022-11-27T18:13:05Z) - Congestion-aware Multi-agent Trajectory Prediction for Collision
Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework.
By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories.
In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z) - A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition.
We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z) - Dense-Caption Matching and Frame-Selection Gating for Temporal
Localization in VideoQA [96.10612095576333]
We propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.
Our model is also comprised of dual-level attention (word/object and frame level), multi-head self-cross-integration for different sources (video and dense captions), and which pass more relevant information to gates.
We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-05-13T16:35:27Z) - Traffic Signs Detection and Recognition System using Deep Learning [0.0]
This paper describes an approach for efficiently detecting and recognizing traffic signs in real-time.
We tackle the traffic sign detection problem using the state-of-the-art of multi-object detection systems.
The focus of this paper is going to be F-RCNN Inception v2 and Tiny YOLO v2 as they achieved the best results.
arXiv Detail & Related papers (2020-03-06T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.