Egocentric and Exocentric Methods: A Short Survey
- URL: http://arxiv.org/abs/2410.20621v2
- Date: Fri, 09 May 2025 02:25:50 GMT
- Title: Egocentric and Exocentric Methods: A Short Survey
- Authors: Anirudh Thatipelli, Shao-Yuan Lo, Amit K. Roy-Chowdhury,
- Abstract summary: Egocentric vision captures the scene from the point of view of the camera wearer.<n>Exocentric vision captures the overall scene context.<n>Jointly modeling ego and exo views is crucial to developing next-generation AI agents.
- Score: 25.41820386246096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentric vision captures the scene from the point of view of the camera wearer, while exocentric vision captures the overall scene context. Jointly modeling ego and exo views is crucial to developing next-generation AI agents. The community has regained interest in the field of egocentric vision. While the third-person view and first-person have been thoroughly investigated, very few works aim to study both synchronously. Exocentric videos contain many relevant signals that are transferrable to egocentric videos. This paper provides a timely overview of works combining egocentric and exocentric visions, a very new but promising research topic. We describe in detail the datasets and present a survey of the key applications of ego-exo joint learning, where we identify the most recent advances. With the presentation of the current status of the progress, we believe this short but timely survey will be valuable to the broad video-understanding community, particularly when multi-view modeling is critical.
Related papers
- EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations [4.252119151012245]
We introduce EgoWorld, a novel framework that reconstructs an egocentric view from rich exocentric observations.<n>Our approach reconstructs a point cloud from estimated exocentric depth maps, reprojects it into the egocentric perspective, and then applies diffusion-based inpainting to produce dense, semantically coherent egocentric images.<n>EgoWorld achieves state-of-the-art performance and demonstrates robust generalization to new objects, actions, scenes, and subjects.
arXiv Detail & Related papers (2025-06-22T04:21:48Z) - EgoM2P: Egocentric Multimodal Multitask Pretraining [55.259234688003545]
Building large-scale egocentric multimodal and multitask models presents unique challenges.<n> EgoM2P is a masked modeling framework that learns from temporally-aware multimodal tokens to train a large, general-purpose model for egocentric 4D understanding.<n>We will fully open-source EgoM2P to support the community and advance egocentric vision research.
arXiv Detail & Related papers (2025-06-09T15:59:25Z) - Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision [35.766320269860245]
Perceiving the world from both egocentric (first-person) and exocentric (third-person) perspectives is fundamental to human cognition.<n>In this survey, we provide a review of video understanding from both exocentric and egocentric viewpoints.
arXiv Detail & Related papers (2025-06-06T17:25:48Z) - EgoLife: Towards Egocentric Life Assistant [60.51196061794498]
We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses.<n>We conduct a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references.<n>This effort resulted in the EgoLife dataset, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation.<n>We introduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide
arXiv Detail & Related papers (2025-03-05T18:54:16Z) - EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World [12.699670048897085]
In human imitation learning, the imitator typically take the egocentric view as a benchmark, naturally transferring behaviors observed from an exocentric view to their owns.<n>We introduce EgoMe, which towards following the process of human imitation learning via the imitator's egocentric view in the real world.<n>Our dataset includes 7902 paired exo-ego videos spanning diverse daily behaviors in various real-world scenarios.
arXiv Detail & Related papers (2025-01-31T11:48:22Z) - From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities [7.952665773362793]
We aim to leverage the complementary nature of egocentric views to enhance LVLM's understanding of exocentric ADL videos.
While effective, this approach requires paired ego-exo videos, which are impractical to collect at scale.
To enhance the ego representation of LVLMs trained on synthetic data, we develop a domain-a bootstrapped ego2exognostic strategy.
arXiv Detail & Related papers (2025-01-10T05:01:58Z) - Ego3DT: Tracking Every 3D Object in Ego-centric Videos [20.96550148331019]
This paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment.
We have also innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos.
arXiv Detail & Related papers (2024-10-11T05:02:31Z) - Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning.
Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities.
By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z) - EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception.
Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view.
We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z) - Put Myself in Your Shoes: Lifting the Egocentric Perspective from
Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.
We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z) - EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.
At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment.
We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z) - Retrieval-Augmented Egocentric Video Captioning [53.2951243928289]
EgoInstructor is a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos.
We train the cross-view retrieval module with a novel EgoExoNCE loss that pulls egocentric and exocentric video features closer by aligning them to shared text features that describe similar actions.
arXiv Detail & Related papers (2024-01-01T15:31:06Z) - The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective [36.09288501153965]
We introduce the Ego-Exocentric Conversational Graph Prediction problem.
We propose a unified multi-modal framework -- Audio-Visual Conversational Attention (AV-CONV)
Specifically, we adopt the self-attention mechanism to model the representations across-time, across-subjects, and across-modalities.
arXiv Detail & Related papers (2023-12-20T09:34:22Z) - Ego-Only: Egocentric Action Detection without Exocentric Transferring [37.89647493482049]
We present Ego-Only, the first approach that enables state-of-the-art action detection on egocentric (first-person) videos.
arXiv Detail & Related papers (2023-01-03T22:22:34Z) - Ego-Exo: Transferring Visual Representations from Third-person to
First-person Videos [92.38049744463149]
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Our idea is to discover latent signals in third-person video that are predictive of key egocentric-specific properties.
Our experiments show that our Ego-Exo framework can be seamlessly integrated into standard video models.
arXiv Detail & Related papers (2021-04-16T06:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.