Hand-Object Interaction Reasoning
- URL: http://arxiv.org/abs/2201.04906v1
- Date: Thu, 13 Jan 2022 11:53:12 GMT
- Title: Hand-Object Interaction Reasoning
- Authors: Jian Ma and Dima Damen
- Abstract summary: We show that modelling two-handed interactions are critical for action recognition in ego-encoded video.
We propose an interaction reasoning network for modelling-temporal relationships between hands and objects in video.
- Score: 33.612083150296364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes an interaction reasoning network for modelling
spatio-temporal relationships between hands and objects in video. The proposed
interaction unit utilises a Transformer module to reason about each acting
hand, and its spatio-temporal relation to the other hand as well as objects
being interacted with. We show that modelling two-handed interactions are
critical for action recognition in egocentric video, and demonstrate that by
using positionally-encoded trajectories, the network can better recognise
observed interactions. We evaluate our proposal on EPIC-KITCHENS and
Something-Else datasets, with an ablation study.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - PEAR: Phrase-Based Hand-Object Interaction Anticipation [20.53329698350243]
First-person hand-object interaction anticipation aims to predict the interaction process based on current scenes and prompts.
Existing research typically anticipates only interaction intention while neglecting manipulation.
We propose a novel model, PEAR, which jointly anticipates interaction intention and manipulation.
arXiv Detail & Related papers (2024-07-31T10:28:49Z) - ORMNet: Object-centric Relationship Modeling for Egocentric Hand-object Segmentation [14.765419467710812]
Egocentric hand-object segmentation (EgoHOS) is a promising new task aiming at segmenting hands and interacting objects in egocentric images.
This paper proposes a novel Object-centric Relationship Modeling Network (ORMNet) to fulfill end-to-end and effective EgoHOS.
arXiv Detail & Related papers (2024-07-08T03:17:10Z) - LEMON: Learning 3D Human-Object Interaction Relation from 2D Images [56.6123961391372]
Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling.
Most existing methods approach the goal by learning to predict isolated interaction elements.
We present LEMON, a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations.
arXiv Detail & Related papers (2023-12-14T14:10:57Z) - Novel-view Synthesis and Pose Estimation for Hand-Object Interaction
from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views.
We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation.
During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z) - Automatic Interaction and Activity Recognition from Videos of Human
Manual Demonstrations with Application to Anomaly Detection [0.0]
This paper exploits Scene Graphs to extract key interaction features from image sequences while simultaneously motion patterns and context.
The method introduces event-based automatic video segmentation and clustering, which allow for the grouping of similar events and detect if a monitored activity is executed correctly.
arXiv Detail & Related papers (2023-04-19T16:15:23Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Spatio-Temporal Interaction Graph Parsing Networks for Human-Object
Interaction Recognition [55.7731053128204]
In given video-based Human-Object Interaction scene, modeling thetemporal relationship between humans and objects are the important cue to understand the contextual information presented in the video.
With the effective-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also directly capture inter-time dependencies.
The full use of appearance features, spatial location and the semantic information are also the key to improve the video-based Human-Object Interaction recognition performance.
arXiv Detail & Related papers (2021-08-19T11:57:27Z) - RR-Net: Injecting Interactive Semantics in Human-Object Interaction
Detection [40.65483058890176]
Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions.
We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference.
Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net)
arXiv Detail & Related papers (2021-04-30T14:03:10Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.