NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of
Hand-Object Interaction
- URL: http://arxiv.org/abs/2402.05532v2
- Date: Fri, 9 Feb 2024 13:00:22 GMT
- Title: NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of
Hand-Object Interaction
- Authors: Zhongqun Zhang and Jifei Song and Eduardo P\'erez-Pellitero and Yiren
Zhou and Hyung Jin Chang and Ale\v{s} Leonardis
- Abstract summary: We present a novel free-point rendering framework, Neural Contact Radiance Field ( NCRF), to reconstruct hand-object interactions from a sparse set of videos.
We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints.
Our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
- Score: 19.957593804898064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling hand-object interactions is a fundamentally challenging task in 3D
computer vision. Despite remarkable progress that has been achieved in this
field, existing methods still fail to synthesize the hand-object interaction
photo-realistically, suffering from degraded rendering quality caused by the
heavy mutual occlusions between the hand and the object, and inaccurate
hand-object pose estimation. To tackle these challenges, we present a novel
free-viewpoint rendering framework, Neural Contact Radiance Field (NCRF), to
reconstruct hand-object interactions from a sparse set of videos. In
particular, the proposed NCRF framework consists of two key components: (a) A
contact optimization field that predicts an accurate contact field from 3D
query points for achieving desirable contact between the hand and the object.
(b) A hand-object neural radiance field to learn an implicit hand-object
representation in a static canonical space, in concert with the specifically
designed hand-object motion field to produce observation-to-canonical
correspondences. We jointly learn these key components where they mutually help
and regularize each other with visual and geometric constraints, producing a
high-quality hand-object reconstruction that achieves photo-realistic novel
view synthesis. Extensive experiments on HO3D and DexYCB datasets show that our
approach outperforms the current state-of-the-art in terms of both rendering
quality and pose estimation accuracy.
Related papers
- Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Novel-view Synthesis and Pose Estimation for Hand-Object Interaction
from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views.
We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation.
During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion
Refinement [42.3418874174372]
We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior.
We learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder.
Experiments demonstrate that TOCH outperforms state-of-the-art 3D hand-object interaction models.
arXiv Detail & Related papers (2022-05-16T20:41:45Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.