THORN: Temporal Human-Object Relation Network for Action Recognition
- URL: http://arxiv.org/abs/2204.09468v1
- Date: Wed, 20 Apr 2022 14:00:24 GMT
- Title: THORN: Temporal Human-Object Relation Network for Action Recognition
- Authors: Mohammed Guermal, Rui Dai, and Francois Bremond
- Abstract summary: Most action recognition models treat human activities as unitary events.
In this paper we propose to recognize human action by leveraging the set of interactions that define an action.
We present an end-to-end network: THORN, that can leverage important human-object and object-object interactions to predict actions.
- Score: 3.6704226968275258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most action recognition models treat human activities as unitary events.
However, human activities often follow a certain hierarchy. In fact, many human
activities are compositional. Also, these actions are mostly human-object
interactions. In this paper we propose to recognize human action by leveraging
the set of interactions that define an action. In this work, we present an
end-to-end network: THORN, that can leverage important human-object and
object-object interactions to predict actions. This model is built on top of a
3D backbone network. The key components of our model are: 1) An object
representation filter for modeling object. 2) An object relation reasoning
module to capture object relations. 3) A classification layer to predict the
action labels. To show the robustness of THORN, we evaluate it on
EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging
first-person and human-object interaction datasets. THORN achieves
state-of-the-art performance on both datasets.
Related papers
- Interpretable Action Recognition on Hard to Classify Actions [11.641926922266347]
Humans recognise complex activities in video by recognising critical-temporal relations among explicitly recognised objects and parts.
To mimic this we build on a model which uses positions of objects and hands, and their motions, to recognise the activity taking place.
To improve this model we focussed on three of the most confused classes (for this model) and identified that the lack of 3D information was the major problem.
A state-of-the-art object detection model was fine-tuned to determine the difference between "Container" and "NotContainer" in order to integrate object shape information into the existing object features.
arXiv Detail & Related papers (2024-09-19T21:23:44Z) - HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects [86.86284624825356]
HIMO is a dataset of full-body human interacting with multiple objects.
HIMO contains 3.3K 4D HOI sequences and 4.08M 3D HOI frames.
arXiv Detail & Related papers (2024-07-17T07:47:34Z) - HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment [43.6454394625555]
HOI-M3 is a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects.
It provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs.
arXiv Detail & Related papers (2024-03-30T09:24:25Z) - Modelling Spatio-Temporal Interactions for Compositional Action
Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed.
We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset.
Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.