Detecting Human-Object Contact in Images
- URL: http://arxiv.org/abs/2303.03373v2
- Date: Tue, 4 Apr 2023 13:48:30 GMT
- Title: Detecting Human-Object Contact in Images
- Authors: Yixin Chen, Sai Kumar Dwivedi, Michael J. Black, Dimitrios Tzionas
- Abstract summary: Humans constantly contact objects to move and perform tasks.
There exists no robust method to detect contact between the body and the scene from an image.
We build a new dataset of human-object contacts for images.
- Score: 75.35017308643471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans constantly contact objects to move and perform tasks. Thus, detecting
human-object contact is important for building human-centered artificial
intelligence. However, there exists no robust method to detect contact between
the body and the scene from an image, and there exists no dataset to learn such
a detector. We fill this gap with HOT ("Human-Object conTact"), a new dataset
of human-object contacts for images. To build HOT, we use two data sources: (1)
We use the PROX dataset of 3D human meshes moving in 3D scenes, and
automatically annotate 2D image areas for contact via 3D mesh proximity and
projection. (2) We use the V-COCO, HAKE and Watch-n-Patch datasets, and ask
trained annotators to draw polygons for the 2D image areas where contact takes
place. We also annotate the involved body part of the human body. We use our
HOT dataset to train a new contact detector, which takes a single color image
as input, and outputs 2D contact heatmaps as well as the body-part labels that
are in contact. This is a new and challenging task that extends current
foot-ground or hand-object contact detectors to the full generality of the
whole body. The detector uses a part-attention branch to guide contact
estimation through the context of the surrounding body parts and scene. We
evaluate our detector extensively, and quantitative results show that our model
outperforms baselines, and that all components contribute to better
performance. Results on images from an online repository show reasonable
detections and generalizability.
Related papers
- Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer [58.98785899556135]
We present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects.
There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement.
arXiv Detail & Related papers (2024-04-07T06:01:49Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - Human keypoint detection for close proximity human-robot interaction [29.99153271571971]
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection.
We propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection.
arXiv Detail & Related papers (2022-07-15T20:33:29Z) - Capturing and Inferring Dense Full-Body Human-Scene Contact [40.29636308110822]
We train a network that predicts dense body-scene contacts from a single RGB image.
We use a transformer to learn such non-local relationships and propose a new Body-Scene contact TRansfOrmer (BSTRO)
To our knowledge, BSTRO is the first method to directly estimate 3D body-scene contact from a single image.
arXiv Detail & Related papers (2022-06-20T03:31:00Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - On Self-Contact and Human Pose [50.96752167102025]
We develop new datasets and methods that significantly improve human pose estimation with self-contact.
We show that the new self-contact training data significantly improves 3D human pose estimates on withheld test data and existing datasets like 3DPW.
arXiv Detail & Related papers (2021-04-07T15:10:38Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z) - Detailed 2D-3D Joint Representation for Human-Object Interaction [45.71407935014447]
We propose a detailed 2D-3D joint representation learning method for HOI learning.
First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes.
Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors.
arXiv Detail & Related papers (2020-04-17T10:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.