Articulated 3D Human-Object Interactions from RGB Videos: An Empirical
Analysis of Approaches and Challenges
- URL: http://arxiv.org/abs/2209.05612v1
- Date: Mon, 12 Sep 2022 21:03:25 GMT
- Title: Articulated 3D Human-Object Interactions from RGB Videos: An Empirical
Analysis of Approaches and Challenges
- Authors: Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel X. Chang, Manolis
Savva
- Abstract summary: We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video.
We use five families of methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting, and free-form mesh fitting.
Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information.
- Score: 19.21834600205309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-object interactions with articulated objects are common in everyday
life. Despite much progress in single-view 3D reconstruction, it is still
challenging to infer an articulated 3D object model from an RGB video showing a
person manipulating the object. We canonicalize the task of articulated 3D
human-object interaction reconstruction from RGB video, and carry out a
systematic benchmark of five families of methods for this task: 3D plane
estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting,
and free-form mesh fitting. Our experiments show that all methods struggle to
obtain high accuracy results even when provided ground truth information about
the observed objects. We identify key factors which make the task challenging
and suggest directions for future work on this challenging 3D computer vision
task. Short video summary at https://www.youtube.com/watch?v=5tAlKBojZwc
Related papers
- Multimodal 3D Reasoning Segmentation with Complex Scenes [92.92045550692765]
We bridge the research gaps by proposing a 3D reasoning segmentation task for multiple objects in scenes.
The task allows producing 3D segmentation masks and detailed textual explanations as enriched by 3D spatial relations among objects.
In addition, we design MORE3D, a simple yet effective method that enables multi-object 3D reasoning segmentation with user questions and textual outputs.
arXiv Detail & Related papers (2024-11-21T08:22:45Z) - 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.
In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z) - Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models [20.277479473218513]
We introduce a new task: Zero-Shot 3D Reasoning for parts searching and localization for objects.
We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands.
We show that Reasoning3D can effectively localize and highlight parts of 3D objects based on implicit textual queries.
arXiv Detail & Related papers (2024-05-29T17:56:07Z) - Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time.
We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image.
We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Monocular 3D Object Detection using Multi-Stage Approaches with
Attention and Slicing aided hyper inference [0.0]
3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world.
We would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics.
arXiv Detail & Related papers (2022-12-22T15:36:07Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.