PPDM: Parallel Point Detection and Matching for Real-time Human-Object
Interaction Detection
- URL: http://arxiv.org/abs/1912.12898v3
- Date: Wed, 25 Mar 2020 12:29:07 GMT
- Title: PPDM: Parallel Point Detection and Matching for Real-time Human-Object
Interaction Detection
- Authors: Yue Liao, Si Liu, Fei Wang, Yanjie Chen, Chen Qian, Jiashi Feng
- Abstract summary: We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps.
It is the first real-time HOI detection method.
- Score: 85.75935399090379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a single-stage Human-Object Interaction (HOI) detection method
that has outperformed all existing methods on HICO-DET dataset at 37 fps on a
single Titan XP GPU. It is the first real-time HOI detection method.
Conventional HOI detection methods are composed of two stages, i.e.,
human-object proposals generation, and proposals classification. Their
effectiveness and efficiency are limited by the sequential and separate
architecture. In this paper, we propose a Parallel Point Detection and Matching
(PPDM) HOI detection framework. In PPDM, an HOI is defined as a point triplet <
human point, interaction point, object point>. Human and object points are the
center of the detection boxes, and the interaction point is the midpoint of the
human and object points. PPDM contains two parallel branches, namely point
detection branch and point matching branch. The point detection branch predicts
three points. Simultaneously, the point matching branch predicts two
displacements from the interaction point to its corresponding human and object
points. The human point and the object point originated from the same
interaction point are considered as matched pairs. In our novel parallel
architecture, the interaction points implicitly provide context and
regularization for human and object detection. The isolated detection boxes are
unlikely to form meaning HOI triplets are suppressed, which increases the
precision of HOI detection. Moreover, the matching between human and object
detection boxes is only applied around limited numbers of filtered candidate
interaction points, which saves much computational cost. Additionally, we build
a new application-oriented database named HOI-A, which severs as a good
supplement to the existing datasets. The source code and the dataset will be
made publicly available to facilitate the development of HOI detection.
Related papers
- Disentangled Pre-training for Human-Object Interaction Detection [22.653500926559833]
We propose an efficient disentangled pre-training method for HOI detection (DP-HOI)
DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers.
It significantly enhances the performance of existing HOI detection models on a broad range of rare categories.
arXiv Detail & Related papers (2024-04-02T08:21:16Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - 6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world.
We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z) - HOTR: End-to-End Human-Object Interaction Detection with Transformers [26.664864824357164]
We present a novel framework, referred to by HOTR, which directly predicts a set of human, object, interaction> triplets from an image.
Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
arXiv Detail & Related papers (2021-04-28T10:10:29Z) - Reformulating HOI Detection as Adaptive Set Prediction [25.44630995307787]
We reformulate HOI detection as an adaptive set prediction problem.
We propose an Adaptive Set-based one-stage framework (AS-Net) with parallel instance and interaction branches.
Our method outperforms previous state-of-the-art methods without any extra human pose and language features.
arXiv Detail & Related papers (2021-03-10T10:40:33Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.