Related papers: Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization

Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization

URL: http://arxiv.org/abs/2411.00883v1
Date: Thu, 31 Oct 2024 14:16:56 GMT
Title: Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization
Authors: Shimin Chen, Wei Li, Jianyang Gu, Chen Chen, Yandong Guo,
Abstract summary: We propose to locate the temporal boundaries of each action and predict action class in untrimmed videos. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance.
Score: 20.268572246761895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the task of temporal action localization of ActivityNet-1.3 datasets, we propose to locate the temporal boundaries of each action and predict action class in untrimmed videos. We first apply VideoSwinTransformer as feature extractor to extract different features. Then we apply a unified network following Faster-TAD to simultaneously obtain proposals and semantic labels. Last, we ensemble the results of different temporal action detection models which complement each other. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance, obtaining comparable results as those of multi-step approaches.

Related papers

Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks. We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z)
HTNet: Anchor-free Temporal Action Localization with Hierarchical Transformers [19.48000379201692]
Temporal action localization (TAL) is a task of identifying a set of actions in a video. We present a novel anchor-free framework, known as HTNet, which predicts a set of start time, end time, class> triplets from a video. We demonstrate how our method localizes accurate action instances and state-of-the-art performance on two TAL benchmark datasets.
arXiv Detail & Related papers (2022-07-20T05:40:03Z)
Context-aware Proposal Network for Temporal Action Detection [47.72048484299649]
This report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge. The task aims to localize temporal boundaries of action instances with specific classes in long untrimmed videos. We argue that the generated proposals contain rich contextual information, which may benefits detection confidence prediction.
arXiv Detail & Related papers (2022-06-18T01:43:43Z)
Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories. We introduce Sparse Proposals to interact with the hierarchical features. Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z)
End-to-end Temporal Action Detection with Transformer [86.80289146697788]
Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video. Here, we construct an end-to-end framework for TAD upon Transformer, termed textitTadTR. Our method achieves state-of-the-art performance on HACS Segments and THUMOS14 and competitive performance on ActivityNet-1.3.
arXiv Detail & Related papers (2021-06-18T17:58:34Z)
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation [79.98992138865042]
We present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG. Specifically, we enhance the vanilla transformer by equipping a snippet actionness loss and a front block, dubbed augmented transformer. An adaptive graph convolutional network (GCN) is proposed to build local temporal context by mining the position information and difference between adjacent features.
arXiv Detail & Related papers (2021-03-30T02:01:03Z)
Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1. We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity. Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z)
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning [63.91369308085091]
We propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video. The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass. The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.
arXiv Detail & Related papers (2020-06-14T13:21:37Z)
Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E) [45.3218136336925]
This report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. By fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.
arXiv Detail & Related papers (2020-06-13T00:33:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.