TMHOI: Translational Model for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2303.04253v3
- Date: Sat, 1 Jul 2023 15:44:42 GMT
- Title: TMHOI: Translational Model for Human-Object Interaction Detection
- Authors: Lijing Zhu, Qizhen Lan, Alvaro Velasquez, Houbing Song, Acharya Kamal,
Qing Tian, Shuteng Niu
- Abstract summary: We propose an innovative graph-based approach to detect human-object interactions (HOIs)
Our method effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge.
Our approach outperformed existing state-of-the-art graph-based methods by a significant margin.
- Score: 18.804647133922195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting human-object interactions (HOIs) is an intricate challenge in the
field of computer vision. Existing methods for HOI detection heavily rely on
appearance-based features, but these may not fully capture all the essential
characteristics necessary for accurate detection. To overcome these challenges,
we propose an innovative graph-based approach called TMGHOI (Translational
Model for Human-Object Interaction Detection). Our method effectively captures
the sentiment representation of HOIs by integrating both spatial and semantic
knowledge. By representing HOIs as a graph, where the interaction components
serve as nodes and their spatial relationships as edges. To extract crucial
spatial and semantic information, TMGHOI employs separate spatial and semantic
encoders. Subsequently, these encodings are combined to construct a knowledge
graph that effectively captures the sentiment representation of HOIs.
Additionally, the ability to incorporate prior knowledge enhances the
understanding of interactions, further boosting detection accuracy. We
conducted extensive evaluations on the widely-used HICO-DET datasets to
demonstrate the effectiveness of TMGHOI. Our approach outperformed existing
state-of-the-art graph-based methods by a significant margin, showcasing its
potential as a superior solution for HOI detection. We are confident that
TMGHOI has the potential to significantly improve the accuracy and efficiency
of HOI detection. Its integration of spatial and semantic knowledge, along with
its computational efficiency and practicality, makes it a valuable tool for
researchers and practitioners in the computer vision community. As with any
research, we acknowledge the importance of further exploration and evaluation
on various datasets to establish the generalizability and robustness of our
proposed method.
Related papers
- Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Know Thy Neighbors: A Graph Based Approach for Effective Sensor-Based
Human Activity Recognition in Smart Homes [0.0]
We propose a novel graph-guided neural network approach for Human Activity Recognition (HAR) in smart homes.
We accomplish this by learning a more expressive graph structure representing the sensor network in a smart home.
Our approach maps discrete input sensor measurements to a feature space through the application of attention mechanisms.
arXiv Detail & Related papers (2023-11-16T02:43:13Z) - Graph Convolutional Network with Connectivity Uncertainty for EEG-based
Emotion Recognition [20.655367200006076]
This study introduces the distribution-based uncertainty method to represent spatial dependencies and temporal-spectral relativeness in EEG signals.
The graph mixup technique is employed to enhance latent connected edges and mitigate noisy label issues.
We evaluate our approach on two widely used datasets, namely SEED and SEEDIV, for emotion recognition tasks.
arXiv Detail & Related papers (2023-10-22T03:47:11Z) - HOKEM: Human and Object Keypoint-based Extension Module for Human-Object
Interaction Detection [1.2183405753834557]
This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models.
Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.
arXiv Detail & Related papers (2023-06-25T14:40:26Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.