A Stronger Baseline for Ego-Centric Action Detection
- URL: http://arxiv.org/abs/2106.06942v1
- Date: Sun, 13 Jun 2021 08:11:31 GMT
- Title: A Stronger Baseline for Ego-Centric Action Detection
- Authors: Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang,
Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang,
- Abstract summary: This report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR 2021 Workshop.
The goal of our task is to locate the start time and the end time of the action in the long untrimmed video, and predict action category.
We adopt sliding window strategy to generate proposals, which can better adapt to short-duration actions.
- Score: 38.934802199184354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report analyzes an egocentric video action detection method we
used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop. The
goal of our task is to locate the start time and the end time of the action in
the long untrimmed video, and predict action category. We adopt sliding window
strategy to generate proposals, which can better adapt to short-duration
actions. In addition, we show that classification and proposals are conflict in
the same network. The separation of the two tasks boost the detection
performance with high efficiency. By simply employing these strategy, we
achieved 16.10\% performance on the test set of EPIC-KITCHENS-100 Action
Detection challenge using a single model, surpassing the baseline method by
11.7\% in terms of average mAP.
Related papers
- Temporal Action Detection with Global Segmentation Mask Learning [134.26292288193298]
Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video.
We propose a proposal-free Temporal Action detection model with Global mask (TAGS)
Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length.
arXiv Detail & Related papers (2022-07-14T00:46:51Z) - Context-aware Proposal Network for Temporal Action Detection [47.72048484299649]
This report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.
The task aims to localize temporal boundaries of action instances with specific classes in long untrimmed videos.
We argue that the generated proposals contain rich contextual information, which may benefits detection confidence prediction.
arXiv Detail & Related papers (2022-06-18T01:43:43Z) - End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data.
Video action detection requires both, action class prediction as well as a-temporal consistency.
We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z) - Temporal Action Localization Using Gated Recurrent Units [6.091096843566857]
We propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing ideas for TAL task.
Specifically, we propose a new design for the output layer of the GRU resulting in the so-called GRU-Splitted model.
We evaluate the performance of the proposed method compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-08-07T06:25:29Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z) - Temporal Fusion Network for Temporal Action Localization:Submission to
ActivityNet Challenge 2020 (Task E) [45.3218136336925]
This report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020.
The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category.
By fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.
arXiv Detail & Related papers (2020-06-13T00:33:00Z) - Fast Template Matching and Update for Video Object Tracking and
Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames.
The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template.
We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.