TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes
- URL: http://arxiv.org/abs/2501.03525v2
- Date: Tue, 04 Feb 2025 01:47:23 GMT
- Title: TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes
- Authors: Alakh Aggarwal, Ningna Wang, Xiaohu Guo,
- Abstract summary: We propose a novel approach that predicts the hand's impact on environmental visibility and indirect illumination on the object's surface albedo.
Our approach surpasses state-of-the-art methods in texture reconstruction and, to the best of our knowledge, is the first to account for hand-object interactions in object texture reconstruction.
- Score: 6.753687000933386
- License:
- Abstract: Reconstructing 3D models of dynamic, real-world objects with high-fidelity textures from monocular frame sequences has been a challenging problem in recent years. This difficulty stems from factors such as shadows, indirect illumination, and inaccurate object-pose estimations due to occluding hand-object interactions. To address these challenges, we propose a novel approach that predicts the hand's impact on environmental visibility and indirect illumination on the object's surface albedo. Our method first learns the geometry and low-fidelity texture of the object, hand, and background through composite rendering of radiance fields. Simultaneously, we optimize the hand and object poses to achieve accurate object-pose estimations. We then refine physics-based rendering parameters - including roughness, specularity, albedo, hand visibility, skin color reflections, and environmental illumination - to produce precise albedo, and accurate hand illumination and shadow regions. Our approach surpasses state-of-the-art methods in texture reconstruction and, to the best of our knowledge, is the first to account for hand-object interactions in object texture reconstruction.
Related papers
- 3D Object Manipulation in a Single Image using Generative Models [30.241857090353864]
We introduce textbfOMG3D, a novel framework that integrates the precise geometric control with the generative power of diffusion models.
Our framework first converts 2D objects into 3D, enabling user-directed modifications and lifelike motions at the geometric level.
Remarkably, all these steps can be done using one NVIDIA 3090.
arXiv Detail & Related papers (2025-01-22T15:06:30Z) - EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image.
We first design a novel pipeline to estimate the underlying hand pose and object shape.
With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z) - Floating No More: Object-Ground Reconstruction from a Single Image [33.34421517827975]
We introduce ORG (Object Reconstruction with Ground), a novel task aimed at reconstructing 3D object geometry in conjunction with the ground surface.
Our method uses two compact pixel-level representations to depict the relationship between camera, object, and ground.
arXiv Detail & Related papers (2024-07-26T17:59:56Z) - Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces [34.831730064258494]
We propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis.
By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map.
We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-29T16:30:17Z) - NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of
Hand-Object Interaction [19.957593804898064]
We present a novel free-point rendering framework, Neural Contact Radiance Field ( NCRF), to reconstruct hand-object interactions from a sparse set of videos.
We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints.
Our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
arXiv Detail & Related papers (2024-02-08T10:09:12Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Neural Fields meet Explicit Geometric Representation for Inverse
Rendering of Urban Scenes [62.769186261245416]
We present a novel inverse rendering framework for large urban scenes capable of jointly reconstructing the scene geometry, spatially-varying materials, and HDR lighting from a set of posed RGB images with optional depth.
Specifically, we use a neural field to account for the primary rays, and use an explicit mesh (reconstructed from the underlying neural field) for modeling secondary rays that produce higher-order lighting effects such as cast shadows.
arXiv Detail & Related papers (2023-04-06T17:51:54Z) - NeROIC: Neural Rendering of Objects from Online Image Collections [42.02832046768925]
We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects.
This enables various object-centric rendering applications such as novel-view synthesis, relighting, and harmonized background composition.
arXiv Detail & Related papers (2022-01-07T16:45:15Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Seeing the World in a Bag of Chips [73.561388215585]
We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors.
Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone.
arXiv Detail & Related papers (2020-01-14T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.