Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
- URL: http://arxiv.org/abs/2401.03522v2
- Date: Mon, 15 Apr 2024 07:59:03 GMT
- Title: Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
- Authors: Rongqin Liang, Yuanman Li, Jiantao Zhou, Xia Li,
- Abstract summary: We introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection.
Unlike previous approaches, the supervised signal of our method is derived from languages rather than one-hot vectors, providing a more comprehensive representation.
It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset.
- Score: 22.16190711818432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.
Related papers
- When, Where, and What? An Novel Benchmark for Accident Anticipation and Localization with Large Language Models [14.090582912396467]
This study introduces a novel framework that integrates Large Language Models (LLMs) to enhance predictive capabilities across multiple dimensions.
We develop an innovative chain-based attention mechanism that dynamically adjusts to prioritize high-risk elements within complex driving scenes.
Empirical validation on the DAD, CCD, and A3D datasets demonstrates superior performance in Average Precision (AP) and Mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-23T08:29:49Z) - Exploring Driving Behavior for Autonomous Vehicles Based on Gramian
Angular Field Vision Transformer [13.020654798874475]
This paper presents the Gramian Angular Field Vision Transformer (GAF-ViT) model, designed to analyze driving behavior.
The proposed-ViT model consists of three key components: Transformer Module, Channel Attention Module, and Multi-Channel ViT Module.
arXiv Detail & Related papers (2023-10-21T04:24:30Z) - Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics.
Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities.
We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z) - PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for
Traffic Flow Prediction [78.05103666987655]
spatial-temporal Graph Neural Network (GNN) models have emerged as one of the most promising methods to solve this problem.
We propose a novel propagation delay-aware dynamic long-range transFormer, namely PDFormer, for accurate traffic flow prediction.
Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2023-01-19T08:42:40Z) - FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model.
In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure.
Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z) - Real-Time Driver Monitoring Systems through Modality and View Analysis [28.18784311981388]
Driver distractions are known to be the dominant cause of road accidents.
State-of-the-art methods prioritize accuracy while ignoring latency.
We propose time-effective detection models by neglecting the temporal relation between video frames.
arXiv Detail & Related papers (2022-10-17T21:22:41Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Anomalous Motion Detection on Highway Using Deep Learning [14.617786106427834]
This paper presents a new anomaly detection dataset - the Highway Traffic Anomaly (HTA) dataset.
We evaluate state-of-the-art deep learning anomaly detection models and propose novel variations to these methods.
arXiv Detail & Related papers (2020-06-15T05:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.