Object pop-up: Can we infer 3D objects and their poses from human
interactions alone?
- URL: http://arxiv.org/abs/2306.00777v2
- Date: Fri, 27 Oct 2023 09:59:45 GMT
- Title: Object pop-up: Can we infer 3D objects and their poses from human
interactions alone?
- Authors: Ilya A. Petrov, Riccardo Marin, Julian Chibane, Gerard Pons-Moll
- Abstract summary: We show that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality.
We validate our method qualitatively and quantitatively, with synthetic data and sequences acquired for the task, showing applicability for XR/VR.
- Score: 36.68984504569907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The intimate entanglement between objects affordances and human poses is of
large interest, among others, for behavioural sciences, cognitive psychology,
and Computer Vision communities. In recent years, the latter has developed
several object-centric approaches: starting from items, learning pipelines
synthesizing human poses and dynamics in a realistic way, satisfying both
geometrical and functional expectations. However, the inverse perspective is
significantly less explored: Can we infer 3D objects and their poses from human
interactions alone? Our investigation follows this direction, showing that a
generic 3D human point cloud is enough to pop up an unobserved object, even
when the user is just imitating a functionality (e.g., looking through a
binocular) without involving a tangible counterpart. We validate our method
qualitatively and quantitatively, with synthetic data and sequences acquired
for the task, showing applicability for XR/VR. The code is available at
https://github.com/ptrvilya/object-popup.
Related papers
- Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - CHORE: Contact, Human and Object REconstruction from a single RGB image [40.817960406002506]
CHORE is a novel method that learns to jointly reconstruct the human and the object from a single RGB image.
We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields.
Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA.
arXiv Detail & Related papers (2022-04-05T18:38:06Z) - Pose2Room: Understanding 3D Scenes from Human Activities [35.702234343672565]
With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input.
We show that P2R-Net can effectively learn multi-modal distributions of likely objects for human motions.
arXiv Detail & Related papers (2021-12-01T20:54:36Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Perceiving 3D Human-Object Spatial Arrangements from a Single Image in
the Wild [96.08358373137438]
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene.
Our method runs on datasets without any scene- or object-level 3D supervision.
arXiv Detail & Related papers (2020-07-30T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.