Related papers: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

URL: http://arxiv.org/abs/2406.08877v2
Date: Tue, 16 Jul 2024 09:35:49 GMT
Title: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Authors: Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng,
Abstract summary: EgoExo-Fitness is a new full-body action understanding dataset. It features fitness sequence videos recorded from synchronized egocentric and fixed exocentric cameras. EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding.
Score: 27.881857222850083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main.

Related papers

EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations [4.252119151012245]
We introduce EgoWorld, a novel framework that reconstructs an egocentric view from rich exocentric observations.<n>Our approach reconstructs a point cloud from estimated exocentric depth maps, reprojects it into the egocentric perspective, and then applies diffusion-based inpainting to produce dense, semantically coherent egocentric images.<n>EgoWorld achieves state-of-the-art performance and demonstrates robust generalization to new objects, actions, scenes, and subjects.
arXiv Detail & Related papers (2025-06-22T04:21:48Z)
Object-Shot Enhanced Grounding Network for Egocentric Video [60.97916755629796]
We propose OSGNet, an Object-Shot enhanced Grounding Network for egocentric video.<n>Specifically, we extract object information from videos to enrich video representation.<n>We analyze the frequent shot movements inherent to egocentric videos, leveraging these features to extract the wearer's attention information.
arXiv Detail & Related papers (2025-05-07T09:20:12Z)
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding [69.96199605596138]
Current MLLMs primarily focus on third-person (exocentric) vision, overlooking the unique aspects of first-person (egocentric) videos. We propose learning the mapping between exocentric and egocentric domains to enhance egocentric video understanding. We introduce Ego-ExoClip, a pre-training dataset comprising 1.1M synchronized ego-exo clip-text pairs.
arXiv Detail & Related papers (2025-03-12T08:10:33Z)
EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World [12.699670048897085]
In human imitation learning, the imitator typically take the egocentric view as a benchmark, naturally transferring behaviors observed from an exocentric view to their owns. We introduce EgoMe, which towards following the process of human imitation learning via the imitator's egocentric view in the real world. Our dataset includes 7902 paired exo-ego videos spanning diverse daily behaviors in various real-world scenarios.
arXiv Detail & Related papers (2025-01-31T11:48:22Z)
Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning. Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities. By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z)
Object Aware Egocentric Online Action Detection [23.504280692701272]
We introduce an Object-Aware Module that integrates egocentric-specific priors into existing Online Action Detection frameworks. Our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements.
arXiv Detail & Related papers (2024-06-03T07:58:40Z)
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? [48.702973928321946]
We introduce a novel asymmetric contrastive objective for EgoHOI named EgoNCE++. Our experiments demonstrate that EgoNCE++ significantly boosts open-vocabulary HOI recognition, multi-instance retrieval, and action recognition tasks.
arXiv Detail & Related papers (2024-05-28T00:27:29Z)
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World [44.34800426136217]
We introduce EgoExoLearn, a dataset that emulates the human demonstration following process. EgoExoLearn contains egocentric and demonstration video data spanning 120 hours. We present benchmarks such as cross-view association, cross-view action planning, and cross-view referenced skill assessment.
arXiv Detail & Related papers (2024-03-24T15:00:44Z)
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z)
Retrieval-Augmented Egocentric Video Captioning [53.2951243928289]
EgoInstructor is a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos. We train the cross-view retrieval module with a novel EgoExoNCE loss that pulls egocentric and exocentric video features closer by aligning them to shared text features that describe similar actions.
arXiv Detail & Related papers (2024-01-01T15:31:06Z)
Action Scene Graphs for Long-Form Understanding of Egocentric Videos [23.058999979457546]
We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs provide a temporally evolving graph-based description of the actions performed by the camera wearer. We will release the dataset and the code to replicate experiments and annotations.
arXiv Detail & Related papers (2023-12-06T10:01:43Z)
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment [71.16699226211504]
We propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time. To this end, we propose AE2, a self-supervised embedding approach with two key designs. For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context.
arXiv Detail & Related papers (2023-06-08T19:54:08Z)
Egocentric Video-Language Pretraining [74.04740069230692]
Video-Language Pretraining aims to learn transferable representation to advance a wide range of video-text downstream tasks. We exploit the recently released Ego4D dataset to pioneer Egocentric training along three directions. We demonstrate strong performance on five egocentric downstream tasks across three datasets.
arXiv Detail & Related papers (2022-06-03T16:28:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.