1st Place Solution to the EPIC-Kitchens Action Anticipation Challenge
2022
- URL: http://arxiv.org/abs/2207.05730v1
- Date: Sun, 10 Jul 2022 09:03:01 GMT
- Title: 1st Place Solution to the EPIC-Kitchens Action Anticipation Challenge
2022
- Authors: Zeyu Jiang, Changxing Ding
- Abstract summary: This report describes the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2022.
Our method achieves state-of-the-art results on the testing set of EPIC-Kitchens Action Anticipation Challenge 2022.
- Score: 15.038891477389537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we describe the technical details of our submission to the
EPIC-Kitchens Action Anticipation Challenge 2022. In this competition, we
develop the following two approaches. 1) Anticipation Time Knowledge
Distillation using the soft labels learned by the teacher model as knowledge to
guide the student network to learn the information of anticipation time; 2)
Verb-Noun Relation Module for building the relationship between verbs and
nouns. Our method achieves state-of-the-art results on the testing set of
EPIC-Kitchens Action Anticipation Challenge 2022.
Related papers
- ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions [66.20773952864802]
We develop a dataset consisting of 8.5k images and 59.3k inferences about actions grounded in those images.
We propose ActionCOMET, a framework to discern knowledge present in language models specific to the provided visual input.
arXiv Detail & Related papers (2024-10-17T15:22:57Z) - NICE: CVPR 2023 Challenge on Zero-shot Image Captioning [149.28330263581012]
NICE project is designed to challenge the computer vision community to develop robust image captioning models.
Report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries.
arXiv Detail & Related papers (2023-09-05T05:32:19Z) - Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction
Recognition Challenge 2023 [8.699868810184752]
The task is to classify the audio caused by interactions between objects, or from events of the camera wearer.
We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and focal loss contribute most to the performance improvement.
This proposed method allowed us to achieve 3rd place in the CVPR 2023 workshop of EPIC-SOUNDS Audio-Based Interaction Recognition Challenge.
arXiv Detail & Related papers (2023-06-15T09:49:07Z) - The Runner-up Solution for YouTube-VIS Long Video Challenge 2022 [72.13080661144761]
We adopt the previously proposed online video instance segmentation method IDOL for this challenge.
We use pseudo labels to further help contrastive learning, so as to obtain more temporally consistent instance embedding.
The proposed method obtains 40.2 AP on the YouTube-VIS 2022 long video dataset and was ranked second in this challenge.
arXiv Detail & Related papers (2022-11-18T01:40:59Z) - 1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene
Text Understanding: Cropped Word Recognition [35.2137931915091]
This report presents our winner solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST)
Our solution achieves an overall word accuracy of 69.73% when considering both in-vocabulary and out-of-vocabulary words.
arXiv Detail & Related papers (2022-08-04T16:20:58Z) - Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022
Technical Report [42.49264486550348]
We propose a phrase relationship mining framework to exploit the temporal localization relationship relevant to the fine-grained phrase and the whole sentence.
Besides, we propose to constrain the localization results of different step sentence queries to not overlap with each other.
Our final submission ranked 2nd on the leaderboard, with only a 0.55% gap from the first.
arXiv Detail & Related papers (2022-07-06T13:50:34Z) - Technical Report for CVPR 2022 LOVEU AQTC Challenge [3.614550981030065]
This report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges.
This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video.
We propose a new context ground module attention mechanism for more effective feature mapping.
arXiv Detail & Related papers (2022-06-29T12:07:43Z) - NTIRE 2022 Challenge on Perceptual Image Quality Assessment [90.04931572825859]
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA)
The challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms.
The winning method can demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2022-06-23T13:36:49Z) - The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020) [186.7816349401443]
We present a new video understanding pentathlon challenge, an open competition held in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020.
The objective of the challenge was to explore and evaluate new methods for text-to-video retrieval.
arXiv Detail & Related papers (2020-08-03T09:55:26Z) - Egocentric Action Recognition by Video Attention and Temporal Context [83.57475598382146]
We present the submission of Samsung AI Centre Cambridge to the CVPR 2020 EPIC-Kitchens Action Recognition Challenge.
In this challenge, action recognition is posed as the problem of simultaneously predicting a single verb' and noun' class label given an input trimmed video clip.
Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data.
arXiv Detail & Related papers (2020-07-03T18:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.