Egocentric Object Manipulation Graphs
- URL: http://arxiv.org/abs/2006.03201v1
- Date: Fri, 5 Jun 2020 02:03:25 GMT
- Title: Egocentric Object Manipulation Graphs
- Authors: Eadom Dessalene, Michael Maynord, Chinmaya Devaraj, Cornelia Fermuller
and Yiannis Aloimonos
- Abstract summary: Ego-OMG is a novel representation for activity and modeling anticipation of near future actions.
It integrates semantic temporal structure, short-term dynamics, and representations for appearance.
Code will be released upon acceptance of Ego-OMG.
- Score: 8.759425622561334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Egocentric Object Manipulation Graphs (Ego-OMG) - a novel
representation for activity modeling and anticipation of near future actions
integrating three components: 1) semantic temporal structure of activities, 2)
short-term dynamics, and 3) representations for appearance. Semantic temporal
structure is modeled through a graph, embedded through a Graph Convolutional
Network, whose states model characteristics of and relations between hands and
objects. These state representations derive from all three levels of
abstraction, and span segments delimited by the making and breaking of
hand-object contact. Short-term dynamics are modeled in two ways: A) through 3D
convolutions, and B) through anticipating the spatiotemporal end points of hand
trajectories, where hands come into contact with objects. Appearance is modeled
through deep spatiotemporal features produced through existing methods. We note
that in Ego-OMG it is simple to swap these appearance features, and thus
Ego-OMG is complementary to most existing action anticipation methods. We
evaluate Ego-OMG on the EPIC Kitchens Action Anticipation Challenge. The
consistency of the egocentric perspective of EPIC Kitchens allows for the
utilization of the hand-centric cues upon which Ego-OMG relies. We demonstrate
state-of-the-art performance, outranking all other previous published methods
by large margins and ranking first on the unseen test set and second on the
seen test set of the EPIC Kitchens Action Anticipation Challenge. We attribute
the success of Ego-OMG to the modeling of semantic structure captured over long
timespans. We evaluate the design choices made through several ablation
studies. Code will be released upon acceptance
Related papers
- HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes [10.237077867790612]
We present HOIMotion, a novel approach for human motion forecasting during human-object interactions.
Our method integrates information about past body poses and egocentric 3D object bounding boxes.
We show that HOIMotion consistently outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2024-07-02T19:58:35Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos [9.340890244344497]
Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactions.
We propose EMAG, an ego-motion-aware and generalizable 2D hand forecasting method.
Our model outperforms prior methods by 1.7% and 7.0% on intra and cross-dataset evaluations.
arXiv Detail & Related papers (2024-05-30T13:15:18Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Action Scene Graphs for Long-Form Understanding of Egocentric Videos [23.058999979457546]
We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos.
EASGs provide a temporally evolving graph-based description of the actions performed by the camera wearer.
We will release the dataset and the code to replicate experiments and annotations.
arXiv Detail & Related papers (2023-12-06T10:01:43Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation [14.188006024550257]
We study the short-term object interaction anticipation problem from the egocentric point of view.
Our approach simultaneously processes a still image and a video detecting and localizing next-active objects.
Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022.
arXiv Detail & Related papers (2023-04-08T09:01:37Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Forecasting Action through Contact Representations from First Person
Video [7.10140895422075]
We introduce representations and models centered on contact, which we then use in action prediction and anticipation.
Using these annotations we train a module producing novel low-level representations of anticipated near future action.
On top of the Anticipation Module we apply Ego-OMG, a framework for action anticipation and prediction.
arXiv Detail & Related papers (2021-02-01T05:52:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.