Related papers: In-Hand 3D Object Reconstruction from a Monocular RGB Video

In-Hand 3D Object Reconstruction from a Monocular RGB Video

URL: http://arxiv.org/abs/2312.16425v1
Date: Wed, 27 Dec 2023 06:19:25 GMT
Title: In-Hand 3D Object Reconstruction from a Monocular RGB Video
Authors: Shijian Jiang, Qi Ye, Rengan Xie, Yuchi Huo, Xiang Li, Yang Zhou, Jiming Chen
Abstract summary: Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object.
Score: 17.31419675163019
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contact region due to occlusion. In this paper, we propose a novel method that deals with surface reconstruction under occlusion by incorporating priors of 2D occlusion elucidation and physical contact constraints. For the former, we introduce an object amodal completion network to infer the 2D complete mask of objects under occlusion. To ensure the accuracy and view consistency of the predicted 2D amodal masks, we devise a joint optimization method for both amodal mask refinement and 3D reconstruction. For the latter, we impose penetration and attraction constraints on the local geometry in contact regions. We evaluate our approach on HO3D and HOD datasets and demonstrate that it outperforms the state-of-the-art methods in terms of reconstruction surface quality, with an improvement of $52\%$ on HO3D and $20\%$ on HOD. Project webpage: https://east-j.github.io/ihor.

Related papers

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images [66.77399370856462]
Amodal3R is a conditional 3D generative model designed to reconstruct 3D objects from partial observations. It learns to recover full 3D objects even in the presence of occlusions in real scenes. It substantially outperforms existing methods that independently perform 2D amodal completion followed by 3D reconstruction.
arXiv Detail & Related papers (2025-03-17T17:59:01Z)
Reconstructing Hand-Held Objects in 3D from Images and Videos [53.277402172488735]
Given a monocular RGB video, we aim to reconstruct hand-held object geometry in 3D, over time. We present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image. We then prompt a text-to-3D generative model using GPT-4(V) to retrieve a 3D object model that matches the object in the image.
arXiv Detail & Related papers (2024-04-09T17:55:41Z)
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes [86.26588382747184]
We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes.
arXiv Detail & Related papers (2024-01-10T18:59:53Z)
O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model [28.372289119872764]
Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects. We propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects.
arXiv Detail & Related papers (2023-08-18T14:38:31Z)
Learning Explicit Contact for Implicit Reconstruction of Hand-held Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects. In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image. In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z)
Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z)
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence. Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z)
Pruning-based Topology Refinement of 3D Mesh using a 2D Alpha Mask [6.103988053817792]
We present a method to refine the topology of any 3D mesh through a face-pruning strategy. Our solution leverages a differentiable that renders each face as a 2D soft map. Because our module is agnostic to the network that produces the 3D mesh, it can be easily plugged into any self-supervised image-based 3D reconstruction pipeline.
arXiv Detail & Related papers (2022-10-17T14:51:38Z)
Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images. We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction. Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z)
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.