ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
- URL: http://arxiv.org/abs/2211.09558v2
- Date: Mon, 25 Sep 2023 12:14:45 GMT
- Title: ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
- Authors: Jiayi Shao and Xiaohan Wang and Yi Yang
- Abstract summary: We present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022.
The goal is to retrieve and localize all instances of possible activities in egocentric videos.
The final submission achieved Recall@1,tIoU=0.5 score of 37.24, average mAP score of 17.67 and took 3-rd place on the leaderboard.
- Score: 42.02602065259257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment
Queries Challenge in ECCV 2022. In this task, the goal is to retrieve and
localize all instances of possible activities in egocentric videos. Ego4D
dataset is challenging for the temporal action localization task as the
temporal duration of the videos is quite long and each video contains multiple
action instances with fine-grained action classes. To address these problems,
we utilize a multi-scale transformer to classify different action categories
and predict the boundary of each instance. Moreover, in order to better capture
the long-term temporal dependencies in the long videos, we propose a
segment-level recurrence mechanism. Compared with directly feeding all video
features to the transformer encoder, the proposed segment-level recurrence
mechanism alleviates the optimization difficulties and achieves better
performance. The final submission achieved Recall@1,tIoU=0.5 score of 37.24,
average mAP score of 17.67 and took 3-rd place on the leaderboard.
Related papers
- Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization [20.268572246761895]
We propose to locate the temporal boundaries of each action and predict action class in untrimmed videos.
Faster-TAD simplifies the pipeline of TAD and gets remarkable performance.
arXiv Detail & Related papers (2024-10-31T14:16:56Z) - Technical Report for Ego4D Long Term Action Anticipation Challenge 2023 [0.0]
We describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023.
The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video.
Our method outperformed the baseline performance and recorded as second place solution on the public leaderboard.
arXiv Detail & Related papers (2023-07-04T04:12:49Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Adaptive Perception Transformer for Temporal Action Localization [13.735402329482719]
This paper proposes a novel end-to-end model, called adaptive perception transformer (AdaPerFormer)
One branch takes care of the global perception attention, which can model entire video sequences and aggregate global relevant contexts.
The other branch concentrates on the local convolutional shift to aggregate intra-frame and inter-frame information.
arXiv Detail & Related papers (2022-08-25T07:42:48Z) - ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries
Challenge 2022 [61.81899056005645]
Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained.
We propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips.
The experimental results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-07-01T12:48:35Z) - AIM 2020 Challenge on Video Temporal Super-Resolution [118.46127362093135]
Second AIM challenge on Video Temporal Super-Resolution (VTSR)
This paper reports the second AIM challenge on Video Temporal Super-Resolution (VTSR)
arXiv Detail & Related papers (2020-09-28T00:10:29Z) - Complementary Boundary Generator with Scale-Invariant Relation Modeling
for Temporal Action Localization: Submission to ActivityNet Challenge 2020 [66.4527310659592]
This report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1.
We decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity.
Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with textbf42.26 average mAP on the challenge testing set.
arXiv Detail & Related papers (2020-07-20T04:35:40Z) - Temporal Fusion Network for Temporal Action Localization:Submission to
ActivityNet Challenge 2020 (Task E) [45.3218136336925]
This report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020.
The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category.
By fusing the results of multiple models, our method obtains 40.55% on the validation set and 40.53% on the test set in terms of mAP, and achieves Rank 1 in this challenge.
arXiv Detail & Related papers (2020-06-13T00:33:00Z) - Hierarchical Attention Network for Action Segmentation [45.19890687786009]
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video.
We propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time.
We evaluate our system on challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets.
arXiv Detail & Related papers (2020-05-07T02:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.