TrafficQA: A Question Answering Benchmark and an Efficient Network for
Video Reasoning over Traffic Events
- URL: http://arxiv.org/abs/2103.15538v2
- Date: Tue, 30 Mar 2021 15:00:27 GMT
- Title: TrafficQA: A Question Answering Benchmark and an Efficient Network for
Video Reasoning over Traffic Events
- Authors: Li Xu, He Huang and Jun Liu
- Abstract summary: We create a novel dataset, TrafficQA (Traffic Question Answering), based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs.
We propose 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events.
We also propose Eclipse, a novel Efficient glimpse network via dynamic inference, in order to achieve computation-efficient and reliable video reasoning.
- Score: 13.46045177335564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic event cognition and reasoning in videos is an important task that has
a wide range of applications in intelligent transportation, assisted driving,
and autonomous vehicles. In this paper, we create a novel dataset, TrafficQA
(Traffic Question Answering), which takes the form of video QA based on the
collected 10,080 in-the-wild videos and annotated 62,535 QA pairs, for
benchmarking the cognitive capability of causal inference and event
understanding models in complex traffic scenarios. Specifically, we propose 6
challenging reasoning tasks corresponding to various traffic scenarios, so as
to evaluate the reasoning capability over different kinds of complex yet
practical traffic events. Moreover, we propose Eclipse, a novel Efficient
glimpse network via dynamic inference, in order to achieve
computation-efficient and reliable video reasoning. The experiments show that
our method achieves superior performance while reducing the computation cost
significantly. The project page: https://github.com/SUTDCV/SUTD-TrafficQA.
Related papers
- NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics [72.95483148058378]
We propose to pre-train a general-purpose machine learning model to capture traffic dynamics with only traffic data from NetFlow records.
We address challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
arXiv Detail & Related papers (2024-12-30T00:47:49Z) - Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks [0.0]
This study evaluates state-of-the-art VideoQA models using non-benchmark synthetic and real-world traffic sequences.
VideoLLaMA-2 advances with 57% accuracy, particularly in compositional reasoning and consistent answers.
These findings underscore VideoQA's potential in traffic monitoring but also emphasize the need for improvements in multi-object tracking, temporal reasoning, and compositional capabilities.
arXiv Detail & Related papers (2024-12-02T05:15:32Z) - TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning [0.0]
We present TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego camera view.
Our solution achieved outstanding results in Track 2 of the AI City Challenge 2024, ranking us third in the challenge standings.
arXiv Detail & Related papers (2024-04-14T14:51:44Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - Traffic-Domain Video Question Answering with Automatic Captioning [69.98381847388553]
Video Question Answering (VidQA) exhibits remarkable potential in facilitating advanced machine reasoning capabilities.
We present a novel approach termed Traffic-domain Video Question Answering with Automatic Captioning (TRIVIA), which serves as a weak-supervision technique for infusing traffic-domain knowledge into large video-language models.
arXiv Detail & Related papers (2023-07-18T20:56:41Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep
Q-Learning and Graph Attention Networks [12.714551756377265]
Traditional planning methods are largely rule-based and scale poorly in complex dynamic scenarios.
We propose DQ-GAT to achieve scalable and proactive autonomous driving.
Our method can better trade-off safety and efficiency in both seen and unseen scenarios.
arXiv Detail & Related papers (2021-08-11T04:55:23Z) - HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos.
We define a method based on general temporal, causal and physics rules which can be transferred across tasks.
This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z) - Edge Computing for Real-Time Near-Crash Detection for Smart
Transportation Applications [29.550609157368466]
Traffic near-crash events serve as critical data sources for various smart transportation applications.
This paper leverages the power of edge computing to address these challenges by processing the video streams from existing dashcams onboard in a real-time manner.
It is among the first efforts in applying edge computing for real-time traffic video analytics and is expected to benefit multiple sub-fields in smart transportation research and applications.
arXiv Detail & Related papers (2020-08-02T19:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.