Related papers: Ego-Only: Egocentric Action Detection without Exocentric Transferring

Ego-Only: Egocentric Action Detection without Exocentric Transferring

URL: http://arxiv.org/abs/2301.01380v2
Date: Fri, 19 May 2023 22:23:48 GMT
Title: Ego-Only: Egocentric Action Detection without Exocentric Transferring
Authors: Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani
Abstract summary: We present Ego-Only, the first approach that enables state-of-the-art action detection on egocentric (first-person) videos.
Score: 37.89647493482049
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Ego-Only, the first approach that enables state-of-the-art action detection on egocentric (first-person) videos without any form of exocentric (third-person) transferring. Despite the content and appearance gap separating the two domains, large-scale exocentric transferring has been the default choice for egocentric action detection. This is because prior works found that egocentric models are difficult to train from scratch and that transferring from exocentric representations leads to improved accuracy. However, in this paper, we revisit this common belief. Motivated by the large gap separating the two domains, we propose a strategy that enables effective training of egocentric models without exocentric transferring. Our Ego-Only approach is simple. It trains the video representation with a masked autoencoder finetuned for temporal segmentation. The learned features are then fed to an off-the-shelf temporal action localization method to detect actions. We find that this renders exocentric transferring unnecessary by showing remarkably strong results achieved by this simple Ego-Only approach on three established egocentric video datasets: Ego4D, EPIC-Kitchens-100, and Charades-Ego. On both action detection and action recognition, Ego-Only outperforms previous best exocentric transferring methods that use orders of magnitude more labels. Ego-Only sets new state-of-the-art results on these datasets and benchmarks without exocentric data.

Related papers

EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations [4.252119151012245]
We introduce EgoWorld, a novel framework that reconstructs an egocentric view from rich exocentric observations.<n>Our approach reconstructs a point cloud from estimated exocentric depth maps, reprojects it into the egocentric perspective, and then applies diffusion-based inpainting to produce dense, semantically coherent egocentric images.<n>EgoWorld achieves state-of-the-art performance and demonstrates robust generalization to new objects, actions, scenes, and subjects.
arXiv Detail & Related papers (2025-06-22T04:21:48Z)
EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World [12.699670048897085]
In human imitation learning, the imitator typically take the egocentric view as a benchmark, naturally transferring behaviors observed from an exocentric view to their owns. We introduce EgoMe, which towards following the process of human imitation learning via the imitator's egocentric view in the real world. Our dataset includes 7902 paired exo-ego videos spanning diverse daily behaviors in various real-world scenarios.
arXiv Detail & Related papers (2025-01-31T11:48:22Z)
Exocentric To Egocentric Transfer For Action Recognition: A Short Survey [25.41820386246096]
Egocentric vision captures the scene from the point of view of the camera wearer. Exocentric vision captures the overall scene context. Jointly modeling ego and exo views is crucial to developing next-generation AI agents.
arXiv Detail & Related papers (2024-10-27T22:38:51Z)
Ego3DT: Tracking Every 3D Object in Ego-centric Videos [20.96550148331019]
This paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video. We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment. We have also innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos.
arXiv Detail & Related papers (2024-10-11T05:02:31Z)
Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning. Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities. By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z)
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding [27.881857222850083]
EgoExo-Fitness is a new full-body action understanding dataset. It features fitness sequence videos recorded from synchronized egocentric and fixed exocentric cameras. EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding.
arXiv Detail & Related papers (2024-06-13T07:28:45Z)
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z)
Retrieval-Augmented Egocentric Video Captioning [53.2951243928289]
EgoInstructor is a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos. We train the cross-view retrieval module with a novel EgoExoNCE loss that pulls egocentric and exocentric video features closer by aligning them to shared text features that describe similar actions.
arXiv Detail & Related papers (2024-01-01T15:31:06Z)
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding [99.904140768186]
This paper proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA) We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy. We believe our data and the findings will pave a new way for Ego-HOI understanding.
arXiv Detail & Related papers (2023-09-05T17:51:16Z)
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos [92.38049744463149]
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets. Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties. Our experiments show that our Ego-Exo framework can be seamlessly integrated into standard video models.
arXiv Detail & Related papers (2021-04-16T06:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.