Related papers: Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges

Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges

URL: http://arxiv.org/abs/2209.05612v1
Date: Mon, 12 Sep 2022 21:03:25 GMT
Title: Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges
Authors: Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel X. Chang, Manolis Savva
Abstract summary: We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video. We use five families of methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting, and free-form mesh fitting. Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information.
Score: 19.21834600205309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person manipulating the object. We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video, and carry out a systematic benchmark of five families of methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting, and free-form mesh fitting. Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information about the observed objects. We identify key factors which make the task challenging and suggest directions for future work on this challenging 3D computer vision task. Short video summary at https://www.youtube.com/watch?v=5tAlKBojZwc

Related papers

SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping [7.222966501323922]
We propose a training free framework SR3D that enables robotic grasping of transparent and specular objects from a single view observation.<n>Specifically, given single view RGB and depth images, SR3D first uses the external visual models to generate 3D reconstructed object mesh.<n>Then, the key idea is to determine the 3D object's pose and scale to accurately localize the reconstructed object back into its original depth corrupted 3D scene.
arXiv Detail & Related papers (2025-05-30T07:38:46Z)
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space [58.623106094568776]
3D morphable models (3DMMs) are a powerful tool to represent the possible shapes and appearances of an object category. We introduce a new method, Common3D, that learns 3DMMs of common objects in a fully self-supervised manner from a collection of object-centric videos. Common3D is the first completely self-supervised method that can solve various vision tasks in a zero-shot manner.
arXiv Detail & Related papers (2025-04-30T15:42:23Z)
Multimodal 3D Reasoning Segmentation with Complex Scenes [92.92045550692765]
We bridge the research gaps by proposing a 3D reasoning segmentation task for multiple objects in scenes. The task allows producing 3D segmentation masks and detailed textual explanations as enriched by 3D spatial relations among objects. In addition, we design MORE3D, a simple yet effective method that enables multi-object 3D reasoning segmentation with user questions and textual outputs.
arXiv Detail & Related papers (2024-11-21T08:22:45Z)
3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects. In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z)
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models [20.277479473218513]
We introduce a new task: Zero-Shot 3D Reasoning for parts searching and localization for objects. We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands. We show that Reasoning3D can effectively localize and highlight parts of 3D objects based on implicit textual queries.
arXiv Detail & Related papers (2024-05-29T17:56:07Z)
Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time. We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image. We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z)
SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR. SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds. We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z)
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence. Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z)
Monocular 3D Object Detection using Multi-Stage Approaches with Attention and Slicing aided hyper inference [0.0]
3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world. We would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics.
arXiv Detail & Related papers (2022-12-22T15:36:07Z)
D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions. Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints. We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z)
Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.