PICO: Reconstructing 3D People In Contact with Objects
- URL: http://arxiv.org/abs/2504.17695v1
- Date: Thu, 24 Apr 2025 16:03:11 GMT
- Title: PICO: Reconstructing 3D People In Contact with Objects
- Authors: Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas,
- Abstract summary: We tackle 3D Human-Object Interaction (HOI) from single color images.<n>We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact on both body and object meshes.<n>We use images from the recent DAMON dataset that are paired with contacts, but these contacts are only annotated on a canonical 3D body.<n>We exploit our new dataset of contact correspondences in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction.
- Score: 47.26810798870849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering 3D Human-Object Interaction (HOI) from single color images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes. We tackle this in two main ways: (1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with contacts, but these contacts are only annotated on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects. (2) We exploit our new dataset of contact correspondences in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object categories that no existing method can tackle. This is crucial to enable HOI understanding to scale in the wild. Our data and code are available at https://pico.is.tue.mpg.de.
Related papers
- InteractVLM: 3D Interaction Reasoning from 2D Foundational Models [85.76211596755151]
We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images.<n>Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual labeling.<n>We propose a new task called Semantic Human Contact estimation, where human contact predictions are conditioned explicitly on object semantics.
arXiv Detail & Related papers (2025-04-07T17:59:33Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Detecting Human-Object Contact in Images [75.35017308643471]
Humans constantly contact objects to move and perform tasks.
There exists no robust method to detect contact between the body and the scene from an image.
We build a new dataset of human-object contacts for images.
arXiv Detail & Related papers (2023-03-06T18:56:26Z) - Capturing and Inferring Dense Full-Body Human-Scene Contact [40.29636308110822]
We train a network that predicts dense body-scene contacts from a single RGB image.
We use a transformer to learn such non-local relationships and propose a new Body-Scene contact TRansfOrmer (BSTRO)
To our knowledge, BSTRO is the first method to directly estimate 3D body-scene contact from a single image.
arXiv Detail & Related papers (2022-06-20T03:31:00Z) - GRAB: A Dataset of Whole-Body Human Grasping of Objects [53.00728704389501]
Training computers to understand human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time.
We collect a new dataset, called GRAB, of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size.
This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task.
arXiv Detail & Related papers (2020-08-25T17:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.