Sparse multi-view hand-object reconstruction for unseen environments
- URL: http://arxiv.org/abs/2405.01353v1
- Date: Thu, 2 May 2024 15:01:25 GMT
- Title: Sparse multi-view hand-object reconstruction for unseen environments
- Authors: Yik Lung Pang, Changjae Oh, Andrea Cavallaro,
- Abstract summary: We train our model on a synthetic hand-object dataset and evaluate directly on a real world recorded hand-object dataset with unseen objects.
We show that while reconstruction of unseen hands and objects from RGB is challenging, additional views can help improve the reconstruction quality.
- Score: 31.604141859402187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works in hand-object reconstruction mainly focus on the single-view and dense multi-view settings. On the one hand, single-view methods can leverage learned shape priors to generalise to unseen objects but are prone to inaccuracies due to occlusions. On the other hand, dense multi-view methods are very accurate but cannot easily adapt to unseen objects without further data collection. In contrast, sparse multi-view methods can take advantage of the additional views to tackle occlusion, while keeping the computational cost low compared to dense multi-view methods. In this paper, we consider the problem of hand-object reconstruction with unseen objects in the sparse multi-view setting. Given multiple RGB images of the hand and object captured at the same time, our model SVHO combines the predictions from each view into a unified reconstruction without optimisation across views. We train our model on a synthetic hand-object dataset and evaluate directly on a real world recorded hand-object dataset with unseen objects. We show that while reconstruction of unseen hands and objects from RGB is challenging, additional views can help improve the reconstruction quality.
Related papers
- Real2Code: Reconstruct Articulated Objects via Code Generation [22.833809817357395]
Real2Code is a novel approach to reconstructing articulated objects via code generation.
We first reconstruct its part geometry using an image segmentation model and a shape completion model.
We represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model to predict joint articulation as code.
arXiv Detail & Related papers (2024-06-12T17:57:06Z) - HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z) - NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from
Multiview Images [44.1333444097976]
We present a neural rendering-based method called NeRO for reconstructing the geometry and the BRDF of reflective objects from multiview images captured in an unknown environment.
arXiv Detail & Related papers (2023-05-27T07:40:07Z) - Partial-View Object View Synthesis via Filtered Inversion [77.282967562509]
FINV learns shape priors by training a 3D generative model.
We show that FINV successfully synthesizes novel views of real-world objects.
arXiv Detail & Related papers (2023-04-03T00:59:31Z) - Reconstructing Hand-Held Objects from Monocular Video [95.06750686508315]
This paper presents an approach that reconstructs a hand-held object from a monocular video.
In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior to the object.
arXiv Detail & Related papers (2022-11-30T09:14:58Z) - TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and
Reconstruction [57.1209039399599]
We propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein.
In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity.
We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation.
arXiv Detail & Related papers (2021-05-16T16:15:05Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.