Related papers: DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

URL: http://arxiv.org/abs/2309.15273v1
Date: Tue, 26 Sep 2023 21:21:07 GMT
Title: DECO: Dense Estimation of 3D Human-Scene Contact In The Wild
Authors: Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black
Abstract summary: We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body. We significantly outperform existing SOTA methods across all benchmarks. We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
Score: 54.44345845842109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. In contrast, we focus on inferring dense, 3D contact between the full body surface and objects in arbitrary images. To achieve this, we first collect DAMON, a new dataset containing dense vertex-level contact annotations paired with RGB images containing complex human-object and human-scene contact. Second, we train DECO, a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate vertex-level contact on the SMPL body. DECO builds on the insight that human observers recognize contact by reasoning about the contacting body parts, their proximity to scene objects, and the surrounding scene context. We perform extensive evaluations of our detector on DAMON as well as on the RICH and BEHAVE datasets. We significantly outperform existing SOTA methods across all benchmarks. We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images. The code, data, and models are available at https://deco.is.tue.mpg.de.

Related papers

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models [85.76211596755151]
We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images. Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual labeling. We propose a new task called Semantic Human Contact estimation, where human contact predictions are conditioned explicitly on object semantics.
arXiv Detail & Related papers (2025-04-07T17:59:33Z)
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer [58.98785899556135]
We present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects. There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement.
arXiv Detail & Related papers (2024-04-07T06:01:49Z)
Detecting Human-Object Contact in Images [75.35017308643471]
Humans constantly contact objects to move and perform tasks. There exists no robust method to detect contact between the body and the scene from an image. We build a new dataset of human-object contacts for images.
arXiv Detail & Related papers (2023-03-06T18:56:26Z)
Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z)
Capturing and Inferring Dense Full-Body Human-Scene Contact [40.29636308110822]
We train a network that predicts dense body-scene contacts from a single RGB image. We use a transformer to learn such non-local relationships and propose a new Body-Scene contact TRansfOrmer (BSTRO) To our knowledge, BSTRO is the first method to directly estimate 3D body-scene contact from a single image.
arXiv Detail & Related papers (2022-06-20T03:31:00Z)
Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video. Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images. We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z)
PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it. Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
Detailed 2D-3D Joint Representation for Human-Object Interaction [45.71407935014447]
We propose a detailed 2D-3D joint representation learning method for HOI learning. First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes. Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors.
arXiv Detail & Related papers (2020-04-17T10:22:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.