EgoX: Egocentric Video Generation from a Single Exocentric Video
- URL: http://arxiv.org/abs/2512.08269v1
- Date: Tue, 09 Dec 2025 05:53:39 GMT
- Title: EgoX: Egocentric Video Generation from a Single Exocentric Video
- Authors: Taewoong Kang, Kinam Kim, Dohyeon Kim, Minho Park, Junha Hyung, Jaegul Choo,
- Abstract summary: We present EgoX, a novel framework for generating egocentric videos from a single excentrico input.<n>Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen in-the-wild videos.
- Score: 46.41583107241048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided self-attention mechanism selectively attends to spatially relevant regions, ensuring geometric coherence and high visual fidelity. Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen and in-the-wild videos.
Related papers
- WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation [51.1909041777449]
We present WorldWander, an in-context learning framework tailored for translating between egocentric and exocentric worlds in video generation.<n> Experiments demonstrate that WorldWander achieves superior perspective synchronization, character consistency, and generalization.
arXiv Detail & Related papers (2025-11-27T04:40:37Z) - Fine-grained Spatiotemporal Grounding on Egocentric Videos [13.319346673043286]
We introduce EgoMask, the first pixel-level benchmark for fine-temporal grounding in egocentric videos.<n>EgoMask is constructed by our proposed automatic annotation pipeline, which annotates referring expressions and object masks.<n>We also create EgoMask-Train, a large-scale training dataset to facilitate model development.
arXiv Detail & Related papers (2025-08-01T10:53:27Z) - EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations [4.252119151012245]
We introduce EgoWorld, a novel framework that reconstructs an egocentric view from rich exocentric observations.<n>Our approach reconstructs a point cloud from estimated exocentric depth maps, reprojects it into the egocentric perspective, and then applies diffusion-based inpainting to produce dense, semantically coherent egocentric images.<n>EgoWorld achieves state-of-the-art performance and demonstrates robust generalization to new objects, actions, scenes, and subjects.
arXiv Detail & Related papers (2025-06-22T04:21:48Z) - PlayerOne: Egocentric World Simulator [73.88786358213694]
PlayerOne is the first egocentric realistic world simulator.<n>It generates egocentric videos that are strictly aligned with the real scene human motion of the user captured by an exocentric camera.
arXiv Detail & Related papers (2025-06-11T17:59:53Z) - EgoM2P: Egocentric Multimodal Multitask Pretraining [55.259234688003545]
Building large-scale egocentric multimodal and multitask models presents unique challenges.<n> EgoM2P is a masked modeling framework that learns from temporally-aware multimodal tokens to train a large, general-purpose model for egocentric 4D understanding.<n>We will fully open-source EgoM2P to support the community and advance egocentric vision research.
arXiv Detail & Related papers (2025-06-09T15:59:25Z) - Egocentric and Exocentric Methods: A Short Survey [25.41820386246096]
Egocentric vision captures the scene from the point of view of the camera wearer.<n>Exocentric vision captures the overall scene context.<n>Jointly modeling ego and exo views is crucial to developing next-generation AI agents.
arXiv Detail & Related papers (2024-10-27T22:38:51Z) - Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning [80.37314291927889]
We present EMBED, a method designed to transform exocentric video-language data for egocentric video representation learning.
Egocentric videos predominantly feature close-up hand-object interactions, whereas exocentric videos offer a broader perspective on human activities.
By applying both vision and language style transfer, our framework creates a new egocentric dataset.
arXiv Detail & Related papers (2024-08-07T06:10:45Z) - Put Myself in Your Shoes: Lifting the Egocentric Perspective from
Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.
We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.