OSGNet @ Ego4D Episodic Memory Challenge 2025
- URL: http://arxiv.org/abs/2506.03710v1
- Date: Wed, 04 Jun 2025 08:41:42 GMT
- Title: OSGNet @ Ego4D Episodic Memory Challenge 2025
- Authors: Yisen Feng, Haoyu Zhang, Qiaohui Chu, Meng Liu, Weili Guan, Yaowei Wang, Liqiang Nie,
- Abstract summary: We present our champion solutions for the three egocentric video localization tracks of the Ego4D Episodic Memory Challenge at CVPR 2025.<n>We adopt an early fusion-based video localization model to tackle all three tasks, aiming to enhance localization accuracy.
- Score: 77.414837862995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present our champion solutions for the three egocentric video localization tracks of the Ego4D Episodic Memory Challenge at CVPR 2025. All tracks require precise localization of the interval within an untrimmed egocentric video. Previous unified video localization approaches often rely on late fusion strategies, which tend to yield suboptimal results. To address this, we adopt an early fusion-based video localization model to tackle all three tasks, aiming to enhance localization accuracy. Ultimately, our method achieved first place in the Natural Language Queries, Goal Step, and Moment Queries tracks, demonstrating its effectiveness. Our code can be found at https://github.com/Yisen-Feng/OSGNet.
Related papers
- Object-Shot Enhanced Grounding Network for Egocentric Video [60.97916755629796]
We propose OSGNet, an Object-Shot enhanced Grounding Network for egocentric video.<n>Specifically, we extract object information from videos to enrich video representation.<n>We analyze the frequent shot movements inherent to egocentric videos, leveraging these features to extract the wearer's attention information.
arXiv Detail & Related papers (2025-05-07T09:20:12Z) - EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation [54.32133648259802]
We present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge.
Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo.
This model is specifically designed to cater to the unique characteristics of egocentric videos and provides strong support for our competition submissions.
arXiv Detail & Related papers (2024-06-26T05:01:37Z) - ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 [51.57555556405898]
We present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024.
Both challenges require the localization of actions within long video sequences using textual queries.
We introduce a novel approach, termed ObjectNLQ, which incorporates an object branch to augment the video representation with detailed object information.
arXiv Detail & Related papers (2024-06-22T07:57:58Z) - Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video
Localization [85.85582751254785]
We present a novel approach to NLVL that aims to address this issue.
Our method involves the direct generation of a global 2D temporal map via a conditional denoising diffusion process.
Our approach effectively encapsulates the interaction between the query and video data across various time scales.
arXiv Detail & Related papers (2024-01-16T09:33:29Z) - SpotEM: Efficient Video Search for Episodic Memory [92.98552727430483]
episodic memory aims to search a long egocentric video to answer a natural language query.
Existing methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer.
We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy.
arXiv Detail & Related papers (2023-06-28T00:52:49Z) - Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023 [41.10032280192564]
This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023.
This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames.
arXiv Detail & Related papers (2023-06-15T14:50:17Z) - ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022 [42.02602065259257]
We present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022.
The goal is to retrieve and localize all instances of possible activities in egocentric videos.
The final submission achieved Recall@1,tIoU=0.5 score of 37.24, average mAP score of 17.67 and took 3-rd place on the leaderboard.
arXiv Detail & Related papers (2022-11-17T14:28:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.