NVIDIA-UNIBZ Submission for EPIC-KITCHENS-100 Action Anticipation
Challenge 2022
- URL: http://arxiv.org/abs/2206.10869v1
- Date: Wed, 22 Jun 2022 06:34:58 GMT
- Title: NVIDIA-UNIBZ Submission for EPIC-KITCHENS-100 Action Anticipation
Challenge 2022
- Authors: Tsung-Ming Tai, Oswald Lanz, Giuseppe Fiameni, Yi-Kwan Wong, Sze-Sen
Poon, Cheng-Kuang Lee, Ka-Chun Cheung, Simon See
- Abstract summary: We describe the technical details of our submission for the EPIC-Kitchen-100 action anticipation challenge.
Our modelings, the higher-order recurrent space-time transformer and the message-passing neural network with edge learning, are both recurrent-based architectures which observe only 2.5 seconds inference context to form the action anticipation prediction.
By averaging the prediction scores from a set of models compiled with our proposed training pipeline, we achieved strong performance on the test set, which is 19.61% overall mean top-5 recall, recorded as second place on the public leaderboard.
- Score: 13.603712913129506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we describe the technical details of our submission for the
EPIC-Kitchen-100 action anticipation challenge. Our modelings, the higher-order
recurrent space-time transformer and the message-passing neural network with
edge learning, are both recurrent-based architectures which observe only 2.5
seconds inference context to form the action anticipation prediction. By
averaging the prediction scores from a set of models compiled with our proposed
training pipeline, we achieved strong performance on the test set, which is
19.61% overall mean top-5 recall, recorded as second place on the public
leaderboard.
Related papers
- VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning [59.68917139718813]
We show that a strong off-the-shelf frozen pretrained visual encoder can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning.
By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting.
arXiv Detail & Related papers (2024-10-04T14:52:09Z) - Early Action Recognition with Action Prototypes [62.826125870298306]
We propose a novel model that learns a prototypical representation of the full action for each class.
We decompose the video into short clips, where a visual encoder extracts features from each clip independently.
Later, a decoder aggregates together in an online fashion features from all the clips for the final class prediction.
arXiv Detail & Related papers (2023-12-11T18:31:13Z) - Predicting the Next Action by Modeling the Abstract Goal [18.873728614415946]
We present an action anticipation model that leverages goal information for the purpose of reducing the uncertainty in future predictions.
We derive a novel concept called abstract goal which is conditioned on observed sequences of visual features for action anticipation.
Our method obtains impressive results on the very challenging Epic-Kitchens55 (EK55), EK100, and EGTEA Gaze+ datasets.
arXiv Detail & Related papers (2022-09-12T06:52:42Z) - SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action
Recognition Challenge 2021 [80.05652375838073]
This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021.
Our submission, visible on the public leaderboard, achieved a top-1 action recognition accuracy of 44.82%, using only RGB.
arXiv Detail & Related papers (2021-10-06T16:29:47Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - A Stronger Baseline for Ego-Centric Action Detection [38.934802199184354]
This report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR 2021 Workshop.
The goal of our task is to locate the start time and the end time of the action in the long untrimmed video, and predict action category.
We adopt sliding window strategy to generate proposals, which can better adapt to short-duration actions.
arXiv Detail & Related papers (2021-06-13T08:11:31Z) - Anticipative Video Transformer [105.20878510342551]
Anticipative Video Transformer (AVT) is an end-to-end attention-based video modeling architecture.
We train the model jointly to predict the next action in a video sequence, while also learning frame feature encoders that are predictive of successive future frames' features.
arXiv Detail & Related papers (2021-06-03T17:57:55Z) - FBK-HUPBA Submission to the EPIC-Kitchens Action Recognition 2020
Challenge [43.8525418821458]
We describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge.
Our submission achieved top Ego-1 action recognition accuracy of 40.0% on S1 setting, and 21% on S2 setting, using only RGB.
arXiv Detail & Related papers (2020-06-24T13:41:17Z) - Rescaling Egocentric Vision [48.57283024015145]
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.
The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos.
Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments)
arXiv Detail & Related papers (2020-06-23T18:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.