Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
- URL: http://arxiv.org/abs/2306.09172v2
- Date: Mon, 25 Sep 2023 12:11:43 GMT
- Title: Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
- Authors: Jiayi Shao and Xiaohan Wang and Ruijie Quan and Yi Yang
- Abstract summary: This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023.
This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames.
- Score: 41.10032280192564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report presents ReLER submission to two tracks in the Ego4D Episodic
Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment
Queries. This solution inherits from our proposed Action Sensitivity Learning
framework (ASL) to better capture discrepant information of frames. Further, we
incorporate a series of stronger video features and fusion strategies. Our
method achieves an average mAP of 29.34, ranking 1st in Moment Queries
Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries
Challenge. Our code will be released.
Related papers
- ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 [51.57555556405898]
We present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024.
Both challenges require the localization of actions within long video sequences using textual queries.
We introduce a novel approach, termed ObjectNLQ, which incorporates an object branch to augment the video representation with detailed object information.
arXiv Detail & Related papers (2024-06-22T07:57:58Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results [126.78130602974319]
This paper reviews the NTIRE 2024 challenge on image super-resolution ($times$4)
The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs.
The aim of the challenge is to obtain designs/solutions with the most advanced SR performance.
arXiv Detail & Related papers (2024-04-15T13:45:48Z) - NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to
the Ego4D Moment Queries Challenge 2023 [8.674624972031387]
This report describes our submission to the Ego4D Moment Queries Challenge 2023.
Our submission extends ActionFormer, a latest method for temporal action localization.
Our solution is ranked 2nd on the public leaderboard with 26.62% average mAP and 45.69% Recall@1x at tIoU=0.5 on the test set, significantly outperforming the strong baseline from 2023 challenge.
arXiv Detail & Related papers (2023-07-05T05:23:49Z) - 1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [25.235404527487784]
Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications.
We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects.
Our method achieved a VPQ score of 51.4 and 53.7 in the development and test phases, respectively, and ranked 1st in the VPS track of the 2nd PVUW Challenge.
arXiv Detail & Related papers (2023-06-07T01:24:48Z) - APRIL-GAN: A Zero-/Few-Shot Anomaly Classification and Segmentation
Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on
Zero-shot AD and 4th Place on Few-shot AD [21.493718012180643]
We present our solution for the Zero/Few-shot Track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge.
Our method achieved first place in the zero-shot track, especially excelling in segmentation.
In the few-shot track, we secured the fourth position overall, with our classification F1 score of 0.8687 ranking first among all participating teams.
arXiv Detail & Related papers (2023-05-27T06:24:43Z) - ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries
Challenge 2022 [61.81899056005645]
Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained.
We propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips.
The experimental results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-07-01T12:48:35Z) - Exploiting Semantic Role Contextualized Video Features for
Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance
Retrieval Challenge 2022 [72.12974259966592]
We present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
We first parse sentences into semantic roles corresponding to verbs and nouns, then utilize self-attentions to exploit semantic role contextualized video features.
arXiv Detail & Related papers (2022-06-29T03:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.