TrafficQA: A Question Answering Benchmark and an Efficient Network for
Video Reasoning over Traffic Events
- URL: http://arxiv.org/abs/2103.15538v2
- Date: Tue, 30 Mar 2021 15:00:27 GMT
- Title: TrafficQA: A Question Answering Benchmark and an Efficient Network for
Video Reasoning over Traffic Events
- Authors: Li Xu, He Huang and Jun Liu
- Abstract summary: We create a novel dataset, TrafficQA (Traffic Question Answering), based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs.
We propose 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events.
We also propose Eclipse, a novel Efficient glimpse network via dynamic inference, in order to achieve computation-efficient and reliable video reasoning.
- Score: 13.46045177335564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic event cognition and reasoning in videos is an important task that has
a wide range of applications in intelligent transportation, assisted driving,
and autonomous vehicles. In this paper, we create a novel dataset, TrafficQA
(Traffic Question Answering), which takes the form of video QA based on the
collected 10,080 in-the-wild videos and annotated 62,535 QA pairs, for
benchmarking the cognitive capability of causal inference and event
understanding models in complex traffic scenarios. Specifically, we propose 6
challenging reasoning tasks corresponding to various traffic scenarios, so as
to evaluate the reasoning capability over different kinds of complex yet
practical traffic events. Moreover, we propose Eclipse, a novel Efficient
glimpse network via dynamic inference, in order to achieve
computation-efficient and reliable video reasoning. The experiments show that
our method achieves superior performance while reducing the computation cost
significantly. The project page: https://github.com/SUTDCV/SUTD-TrafficQA.
Related papers
- TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning [0.0]
We present TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego camera view.
Our solution achieved outstanding results in Track 2 of the AI City Challenge 2024, ranking us third in the challenge standings.
arXiv Detail & Related papers (2024-04-14T14:51:44Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - Traffic-Domain Video Question Answering with Automatic Captioning [69.98381847388553]
Video Question Answering (VidQA) exhibits remarkable potential in facilitating advanced machine reasoning capabilities.
We present a novel approach termed Traffic-domain Video Question Answering with Automatic Captioning (TRIVIA), which serves as a weak-supervision technique for infusing traffic-domain knowledge into large video-language models.
arXiv Detail & Related papers (2023-07-18T20:56:41Z) - Semantic-aware Dynamic Retrospective-Prospective Reasoning for
Event-level Video Question Answering [14.659023742381777]
Event-Level Video Question Answering (EVQA) requires complex reasoning across video events to provide optimal answers.
We propose a semantic-aware dynamic retrospective-prospective reasoning approach for video-based question answering.
Our proposed approach achieves superior performance compared to previous state-of-the-art models.
arXiv Detail & Related papers (2023-05-14T03:57:11Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep
Q-Learning and Graph Attention Networks [12.714551756377265]
Traditional planning methods are largely rule-based and scale poorly in complex dynamic scenarios.
We propose DQ-GAT to achieve scalable and proactive autonomous driving.
Our method can better trade-off safety and efficiency in both seen and unseen scenarios.
arXiv Detail & Related papers (2021-08-11T04:55:23Z) - Multi-intersection Traffic Optimisation: A Benchmark Dataset and a
Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas.
Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent.
We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z) - HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos.
We define a method based on general temporal, causal and physics rules which can be transferred across tasks.
This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z) - Edge Computing for Real-Time Near-Crash Detection for Smart
Transportation Applications [29.550609157368466]
Traffic near-crash events serve as critical data sources for various smart transportation applications.
This paper leverages the power of edge computing to address these challenges by processing the video streams from existing dashcams onboard in a real-time manner.
It is among the first efforts in applying edge computing for real-time traffic video analytics and is expected to benefit multiple sub-fields in smart transportation research and applications.
arXiv Detail & Related papers (2020-08-02T19:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.