Egocentric Video Task Translation @ Ego4D Challenge 2022
- URL: http://arxiv.org/abs/2302.01891v1
- Date: Fri, 3 Feb 2023 18:05:49 GMT
- Title: Egocentric Video Task Translation @ Ego4D Challenge 2022
- Authors: Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani
- Abstract summary: The EgoTask Translation approach explores relations among a set of egocentric video tasks in the Ego4D challenge.
We propose to leverage existing models developed for other related tasks and design a task that learns to ''translate'' auxiliary task features to the primary task.
Our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR localization challenge.
- Score: 109.30649877677257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report describes the EgoTask Translation approach that
explores relations among a set of egocentric video tasks in the Ego4D
challenge. To improve the primary task of interest, we propose to leverage
existing models developed for other related tasks and design a task translator
that learns to ''translate'' auxiliary task features to the primary task. With
no modification to the baseline architectures, our proposed approach achieves
competitive performance on two Ego4D challenges, ranking the 1st in the talking
to me challenge and the 3rd in the PNR keyframe localization challenge.
Related papers
- QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View [2.3982875575861677]
We present our solutions for a spectrum of automation tasks in life-saving intervention procedures within the Trauma THOMPSON (T3) Challenge.
For action recognition and anticipation, we propose a pre-processing strategy that samples and stitches multiple inputs into a single image.
For training, we present an action dictionary-guided design, which consistently yields the most favorable results.
arXiv Detail & Related papers (2024-07-18T06:55:26Z) - EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation [54.32133648259802]
We present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge.
Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo.
This model is specifically designed to cater to the unique characteristics of egocentric videos and provides strong support for our competition submissions.
arXiv Detail & Related papers (2024-06-26T05:01:37Z) - Put Myself in Your Shoes: Lifting the Egocentric Perspective from
Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.
We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z) - EgoTV: Egocentric Task Verification from Natural Language Task
Descriptions [9.503477434050858]
We propose a benchmark and a synthetic dataset called Egocentric Task Verification (EgoTV)
The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks.
We propose a novel Neuro-Symbolic Grounding (NSG) approach that leverages symbolic representations to capture the compositional and temporal structure of tasks.
arXiv Detail & Related papers (2023-03-29T19:16:49Z) - Egocentric Video Task Translation [109.30649877677257]
We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once.
Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition.
arXiv Detail & Related papers (2022-12-13T00:47:13Z) - Exploring Anchor-based Detection for Ego4D Natural Language Query [74.87656676444163]
This paper presents technique report of Ego4D natural language query challenge in CVPR 2022.
We propose our solution of this challenge to solve the above issues.
arXiv Detail & Related papers (2022-08-10T14:43:37Z) - Egocentric Video-Language Pretraining [74.04740069230692]
Video-Language Pretraining aims to learn transferable representation to advance a wide range of video-text downstream tasks.
We exploit the recently released Ego4D dataset to pioneer Egocentric training along three directions.
We demonstrate strong performance on five egocentric downstream tasks across three datasets.
arXiv Detail & Related papers (2022-06-03T16:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.