EgoEnv: Human-centric environment representations from egocentric video
- URL: http://arxiv.org/abs/2207.11365v3
- Date: Thu, 9 Nov 2023 19:13:18 GMT
- Title: EgoEnv: Human-centric environment representations from egocentric video
- Authors: Tushar Nagarajan, Santhosh Kumar Ramakrishnan, Ruta Desai, James
Hillis, Kristen Grauman
- Abstract summary: First-person video highlights a camera-wearer's activities in the context of their persistent environment.
Current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space.
We present an approach that links egocentric video and the environment by learning representations that are predictive of the camera-wearer's (potentially unseen) local surroundings.
- Score: 60.34649902578047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: First-person video highlights a camera-wearer's activities in the context of
their persistent environment. However, current video understanding approaches
reason over visual features from short video clips that are detached from the
underlying physical space and capture only what is immediately visible. To
facilitate human-centric environment understanding, we present an approach that
links egocentric video and the environment by learning representations that are
predictive of the camera-wearer's (potentially unseen) local surroundings. We
train such models using videos from agents in simulated 3D environments where
the environment is fully observable, and test them on human-captured real-world
videos from unseen environments. On two human-centric video tasks, we show that
models equipped with our environment-aware features consistently outperform
their counterparts with traditional clip features. Moreover, despite being
trained exclusively on simulated videos, our approach successfully handles
real-world videos from HouseTours and Ego4D, and achieves state-of-the-art
results on the Ego4D NLQ challenge. Project page:
https://vision.cs.utexas.edu/projects/ego-env/
Related papers
- EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos [87.32349247938136]
Existing approaches implicitly assume total correspondence between the video and audio during training.
We propose a novel ambient-aware audio generation model, AV-LDM.
Our approach is the first to focus video-to-audio generation faithfully on the observed visual content.
arXiv Detail & Related papers (2024-06-13T16:10:19Z) - EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.
At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment.
We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z) - Self-supervised video pretraining yields human-aligned visual
representations [10.406358397515838]
General representations far outperform prior video pretraining methods on image understanding tasks.
VITO representations are significantly more robust to natural and synthetic deformations than image-, video-, and adversarially-trained ones.
These results suggest that video pretraining could be a simple way of learning unified, robust, and human-aligned representations of the visual world.
arXiv Detail & Related papers (2022-10-12T17:30:12Z) - Ego4D: Around the World in 3,000 Hours of Egocentric Video [276.1326075259486]
Ego4D is a massive-scale egocentric video dataset and benchmark suite.
It offers 3,025 hours of daily-life activity video spanning hundreds of scenarios captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries.
Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event.
arXiv Detail & Related papers (2021-10-13T22:19:32Z) - EGO-TOPO: Environment Affordances from Egocentric Video [104.77363598496133]
We introduce a model for environment affordances that is learned directly from egocentric video.
Our approach decomposes a space into a topological map derived from first-person activity.
On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.
arXiv Detail & Related papers (2020-01-14T01:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.