EGO-TOPO: Environment Affordances from Egocentric Video
- URL: http://arxiv.org/abs/2001.04583v2
- Date: Fri, 27 Mar 2020 20:30:19 GMT
- Title: EGO-TOPO: Environment Affordances from Egocentric Video
- Authors: Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman
- Abstract summary: We introduce a model for environment affordances that is learned directly from egocentric video.
Our approach decomposes a space into a topological map derived from first-person activity.
On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.
- Score: 104.77363598496133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: First-person video naturally brings the use of a physical environment to the
forefront, since it shows the camera wearer interacting fluidly in a space
based on his intentions. However, current methods largely separate the observed
actions from the persistent space itself. We introduce a model for environment
affordances that is learned directly from egocentric video. The main idea is to
gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely
activities they support. Our approach decomposes a space into a topological map
derived from first-person activity, organizing an ego-video into a series of
visits to the different zones. Further, we show how to link zones across
multiple related environments (e.g., from videos of multiple kitchens) to
obtain a consolidated representation of environment functionality. On
EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene
affordances and anticipating future actions in long-form video.
Related papers
- Grounding 3D Scene Affordance From Egocentric Interactions [52.5827242925951]
Grounding 3D scene affordance aims to locate interactive regions in 3D environments.
We introduce a novel task: grounding 3D scene affordance from egocentric interactions.
arXiv Detail & Related papers (2024-09-29T10:46:19Z) - Egocentric zone-aware action recognition across environments [17.67702928208351]
Activity-centric zones can serve as a prior to favor vision models to recognize human activities.
The appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains.
We show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models.
arXiv Detail & Related papers (2024-09-21T17:40:48Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Object Aware Egocentric Online Action Detection [23.504280692701272]
We introduce an Object-Aware Module that integrates egocentric-specific priors into existing Online Action Detection frameworks.
Our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements.
arXiv Detail & Related papers (2024-06-03T07:58:40Z) - EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception.
Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view.
We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z) - EgoEnv: Human-centric environment representations from egocentric video [60.34649902578047]
First-person video highlights a camera-wearer's activities in the context of their persistent environment.
Current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space.
We present an approach that links egocentric video and the environment by learning representations that are predictive of the camera-wearer's (potentially unseen) local surroundings.
arXiv Detail & Related papers (2022-07-22T22:39:57Z) - Video2Skill: Adapting Events in Demonstration Videos to Skills in an
Environment using Cyclic MDP Homomorphisms [16.939129935919325]
Video2Skill (V2S) attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos.
We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations.
We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data.
arXiv Detail & Related papers (2021-09-08T17:59:01Z) - Egocentric Activity Recognition and Localization on a 3D Map [94.30708825896727]
We address the problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos.
Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations.
arXiv Detail & Related papers (2021-05-20T06:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.