Human and Machine Action Prediction Independent of Object Information
- URL: http://arxiv.org/abs/2004.10518v1
- Date: Wed, 22 Apr 2020 12:13:25 GMT
- Title: Human and Machine Action Prediction Independent of Object Information
- Authors: Fatemeh Ziaeetabar, Jennifer Pomp, Stefan Pfeiffer, Nadiya El-Sourani,
Ricarda I. Schubotz, Minija Tamosiunaite and Florentin W\"org\"otter
- Abstract summary: We study the role of inter-object relations that change during an action.
We predict actions in, on average, less than 64% of the action's duration.
- Score: 1.0806206850043696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting other people's action is key to successful social interactions,
enabling us to adjust our own behavior to the consequence of the others' future
actions. Studies on action recognition have focused on the importance of
individual visual features of objects involved in an action and its context.
Humans, however, recognize actions on unknown objects or even when objects are
imagined (pantomime). Other cues must thus compensate the lack of recognizable
visual object features. Here, we focus on the role of inter-object relations
that change during an action. We designed a virtual reality setup and tested
recognition speed for 10 different manipulation actions on 50 subjects. All
objects were abstracted by emulated cubes so the actions could not be inferred
using object information. Instead, subjects had to rely only on the information
that comes from the changes in the spatial relations that occur between those
cubes. In spite of these constraints, our results show the subjects were able
to predict actions in, on average, less than 64% of the action's duration. We
employed a computational model -an enriched Semantic Event Chain (eSEC)-
incorporating the information of spatial relations, specifically (a) objects'
touching/untouching, (b) static spatial relations between objects and (c)
dynamic spatial relations between objects. Trained on the same actions as those
observed by subjects, the model successfully predicted actions even better than
humans. Information theoretical analysis shows that eSECs optimally use
individual cues, whereas humans presumably mostly rely on a mixed-cue strategy,
which takes longer until recognition. Providing a better cognitive basis of
action recognition may, on one hand improve our understanding of related human
pathologies and, on the other hand, also help to build robots for conflict-free
human-robot cooperation. Our results open new avenues here.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - LEMON: Learning 3D Human-Object Interaction Relation from 2D Images [56.6123961391372]
Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling.
Most existing methods approach the goal by learning to predict isolated interaction elements.
We present LEMON, a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations.
arXiv Detail & Related papers (2023-12-14T14:10:57Z) - Modelling Spatio-Temporal Interactions for Compositional Action
Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed.
We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset.
Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z) - Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings [6.371828910727037]
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks.
We address the problem of affordance categorization for class-agnostic objects with an open set of interactions.
A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs.
arXiv Detail & Related papers (2023-03-30T15:04:04Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Object and Relation Centric Representations for Push Effect Prediction [18.990827725752496]
Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement.
We propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions.
Our framework is validated both in real and simulated environments containing different shaped multi-part objects connected via different types of joints and objects with different masses.
arXiv Detail & Related papers (2021-02-03T15:09:12Z) - Object Properties Inferring from and Transfer for Human Interaction
Motions [51.896592493436984]
In this paper, we present a fine-grained action recognition method that learns to infer object properties from human interaction motion alone.
We collect a large number of videos and 3D skeletal motions of the performing actors using an inertial motion capture device.
In particular, we learn to identify the interacting object, by estimating its weight, or its fragility or delicacy.
arXiv Detail & Related papers (2020-08-20T14:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.