Egocentric Activity Recognition and Localization on a 3D Map
- URL: http://arxiv.org/abs/2105.09544v1
- Date: Thu, 20 May 2021 06:58:15 GMT
- Title: Egocentric Activity Recognition and Localization on a 3D Map
- Authors: Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman,
James M. Rehg and Chao Li
- Abstract summary: We address the problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos.
Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations.
- Score: 94.30708825896727
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Given a video captured from a first person perspective and recorded in a
familiar environment, can we recognize what the person is doing and identify
where the action occurs in the 3D space? We address this challenging problem of
jointly recognizing and localizing actions of a mobile user on a known 3D map
from egocentric videos. To this end, we propose a novel deep probabilistic
model. Our model takes the inputs of a Hierarchical Volumetric Representation
(HVR) of the environment and an egocentric video, infers the 3D action location
as a latent variable, and recognizes the action based on the video and
contextual cues surrounding its potential locations. To evaluate our model, we
conduct extensive experiments on a newly collected egocentric video dataset, in
which both human naturalistic actions and photo-realistic 3D environment
reconstructions are captured. Our method demonstrates strong results on both
action recognition and 3D action localization across seen and unseen
environments. We believe our work points to an exciting research direction in
the intersection of egocentric vision, and 3D scene understanding.
Related papers
- Ego3DT: Tracking Every 3D Object in Ego-centric Videos [20.96550148331019]
This paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment.
We have also innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos.
arXiv Detail & Related papers (2024-10-11T05:02:31Z) - Grounding 3D Scene Affordance From Egocentric Interactions [52.5827242925951]
Grounding 3D scene affordance aims to locate interactive regions in 3D environments.
We introduce a novel task: grounding 3D scene affordance from egocentric interactions.
arXiv Detail & Related papers (2024-09-29T10:46:19Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - EgoHumans: An Egocentric 3D Multi-Human Benchmark [37.375846688453514]
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild.
We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc.
arXiv Detail & Related papers (2023-05-25T21:37:36Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - The One Where They Reconstructed 3D Humans and Environments in TV Shows [33.533207518342465]
TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data.
We propose an automatic approach that operates on an entire season of a TV show and aggregates information in 3D.
We show that reasoning about humans and their environment in 3D enables a broad range of downstream applications.
arXiv Detail & Related papers (2022-07-28T17:57:30Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.