Related papers: TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

URL: http://arxiv.org/abs/2103.15538v2
Date: Tue, 30 Mar 2021 15:00:27 GMT
Title: TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Authors: Li Xu, He Huang and Jun Liu
Abstract summary: We create a novel dataset, TrafficQA (Traffic Question Answering), based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs. We propose 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events. We also propose Eclipse, a novel Efficient glimpse network via dynamic inference, in order to achieve computation-efficient and reliable video reasoning.
Score: 13.46045177335564
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traffic event cognition and reasoning in videos is an important task that has a wide range of applications in intelligent transportation, assisted driving, and autonomous vehicles. In this paper, we create a novel dataset, TrafficQA (Traffic Question Answering), which takes the form of video QA based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios. Specifically, we propose 6 challenging reasoning tasks corresponding to various traffic scenarios, so as to evaluate the reasoning capability over different kinds of complex yet practical traffic events. Moreover, we propose Eclipse, a novel Efficient glimpse network via dynamic inference, in order to achieve computation-efficient and reliable video reasoning. The experiments show that our method achieves superior performance while reducing the computation cost significantly. The project page: https://github.com/SUTDCV/SUTD-TrafficQA.

Related papers

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding [76.3876070043663]
We propose DriveLMM-o1, a dataset and benchmark designed to advance step-wise visual reasoning for autonomous driving. Our benchmark features over 18k VQA examples in the training set and more than 4k in the test set, covering diverse questions on perception, prediction, and planning. Our model achieves a +7.49% gain in final answer accuracy, along with a 3.62% improvement in reasoning score over the previous best open-source model.
arXiv Detail & Related papers (2025-03-13T17:59:01Z)
NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics [72.95483148058378]
We propose to pre-train a general-purpose machine learning model to capture traffic dynamics with only traffic data from NetFlow records. We address challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
arXiv Detail & Related papers (2024-12-30T00:47:49Z)
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks [0.0]
This study evaluates state-of-the-art VideoQA models using non-benchmark synthetic and real-world traffic sequences. VideoLLaMA-2 advances with 57% accuracy, particularly in compositional reasoning and consistent answers. These findings underscore VideoQA's potential in traffic monitoring but also emphasize the need for improvements in multi-object tracking, temporal reasoning, and compositional capabilities.
arXiv Detail & Related papers (2024-12-02T05:15:32Z)
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning [0.0]
We present TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego camera view. Our solution achieved outstanding results in Track 2 of the AI City Challenge 2024, ranking us third in the challenge standings.
arXiv Detail & Related papers (2024-04-14T14:51:44Z)
DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems. We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z)
Traffic-Domain Video Question Answering with Automatic Captioning [69.98381847388553]
Video Question Answering (VidQA) exhibits remarkable potential in facilitating advanced machine reasoning capabilities. We present a novel approach termed Traffic-domain Video Question Answering with Automatic Captioning (TRIVIA), which serves as a weak-supervision technique for infusing traffic-domain knowledge into large video-language models.
arXiv Detail & Related papers (2023-07-18T20:56:41Z)
Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering [14.659023742381777]
Event-Level Video Question Answering (EVQA) requires complex reasoning across video events to provide optimal answers. We propose a semantic-aware dynamic retrospective-prospective reasoning approach for video-based question answering. Our proposed approach achieves superior performance compared to previous state-of-the-art models.
arXiv Detail & Related papers (2023-05-14T03:57:11Z)
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories. We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
Utilizing Background Knowledge for Robust Reasoning over Traffic Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding. We scope our study to text-based methods and datasets given the abundant commonsense knowledge. We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z)
DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks [12.714551756377265]
Traditional planning methods are largely rule-based and scale poorly in complex dynamic scenarios. We propose DQ-GAT to achieve scalable and proactive autonomous driving. Our method can better trade-off safety and efficiency in both seen and unseen scenarios.
arXiv Detail & Related papers (2021-08-11T04:55:23Z)
Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas. Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent. We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z)
HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos. We define a method based on general temporal, causal and physics rules which can be transferred across tasks. This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z)
Edge Computing for Real-Time Near-Crash Detection for Smart Transportation Applications [29.550609157368466]
Traffic near-crash events serve as critical data sources for various smart transportation applications. This paper leverages the power of edge computing to address these challenges by processing the video streams from existing dashcams onboard in a real-time manner. It is among the first efforts in applying edge computing for real-time traffic video analytics and is expected to benefit multiple sub-fields in smart transportation research and applications.
arXiv Detail & Related papers (2020-08-02T19:39:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.