ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
- URL: http://arxiv.org/abs/2503.18553v1
- Date: Mon, 24 Mar 2025 11:06:04 GMT
- Title: ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
- Authors: Zihao Chen, Hsuanyu Wu, Chi-Hsi Kung, Yi-Ting Chen, Yan-Tsung Peng,
- Abstract summary: We introduce the Aerial Traffic Atomic Activity Recognition and (ATARS) dataset, the first aerial dataset designed for multi-label atomic activity analysis.<n>We offer atomic activity labels for each frame, which accurately record the intervals for traffic activities.<n>We propose a novel task, Multi-label trimmed Atomic Activity Recognition, enabling the study of accurate temporal localization for atomic activity.
- Score: 11.07193206318681
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traffic Atomic Activity which describes traffic patterns for topological intersection dynamics is a crucial topic for the advancement of intelligent driving systems. However, existing atomic activity datasets are collected from an egocentric view, which cannot support the scenarios where traffic activities in an entire intersection are required. Moreover, existing datasets only provide video-level atomic activity annotations, which require exhausting efforts to manually trim the videos for recognition and limit their applications to untrimmed videos. To bridge this gap, we introduce the Aerial Traffic Atomic Activity Recognition and Segmentation (ATARS) dataset, the first aerial dataset designed for multi-label atomic activity analysis. We offer atomic activity labels for each frame, which accurately record the intervals for traffic activities. Moreover, we propose a novel task, Multi-label Temporal Atomic Activity Recognition, enabling the study of accurate temporal localization for atomic activity and easing the burden of manual video trimming for recognition. We conduct extensive experiments to evaluate existing state-of-the-art models on both atomic activity recognition and temporal atomic activity segmentation. The results highlight the unique challenges of our ATARS dataset, such as recognizing extremely small objects' activities. We further provide comprehensive discussion analyzing these challenges and offer valuable insights for future direction to improve recognizing atomic activity in aerial view. Our source code and dataset are available at https://github.com/magecliff96/ATARS/
Related papers
- VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation [6.278293754210117]
VCHAR (Variance-Driven Complex Human Activity Recognition) is a novel framework that treats the outputs of atomic activities as a distribution over specified intervals.
We show that VCHAR enhances the accuracy of complex activity recognition without necessitating precise temporal or sequential labeling of atomic activities.
arXiv Detail & Related papers (2024-07-03T17:24:36Z) - Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes [23.284478293459856]
Action-slot is a slot attention-based approach that learns visual action-centric representations.
Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur.
To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS.
arXiv Detail & Related papers (2023-11-29T05:28:05Z) - Enhancing Next Active Object-based Egocentric Action Anticipation with
Guided Attention [45.60789439017625]
Short-term action anticipation (STA) in first-person videos is a challenging task.
We propose a novel approach that applies a guided attention mechanism between objects.
Our method, GANO, is a multi-modal, end-to-end, single transformer-based network.
arXiv Detail & Related papers (2023-05-22T11:56:10Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - Semi-Supervised Few-Shot Atomic Action Recognition [59.587738451616495]
We propose a novel model for semi-supervised few-shot atomic action recognition.
Our model features unsupervised and contrastive video embedding, loose action alignment, multi-head feature comparison, and attention-based aggregation.
Experiments show that our model can attain high accuracy on representative atomic action datasets outperforming their respective state-of-the-art classification accuracy in full supervision setting.
arXiv Detail & Related papers (2020-11-17T03:59:05Z) - Sequential Weakly Labeled Multi-Activity Localization and Recognition on
Wearable Sensors using Recurrent Attention Networks [13.64024154785943]
We propose a recurrent attention network (RAN) to handle sequential weakly labeled multi-activity recognition and location tasks.
Our RAN model can simultaneously infer multi-activity types from the coarse-grained sequential weak labels.
It will greatly reduce the burden of manual labeling.
arXiv Detail & Related papers (2020-04-13T04:57:09Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.