InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
- URL: http://arxiv.org/abs/2504.07949v1
- Date: Thu, 10 Apr 2025 17:55:43 GMT
- Title: InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
- Authors: Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash,
- Abstract summary: We present InteracttAvatar, the first model to faithfully capture the appearance of dynamic hand and non-rigid hand-face interactions.<n>Our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures.
- Score: 9.665857631154612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.
Related papers
- BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects.<n>We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation.<n>The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z) - XHand: Real-time Expressive Hand Avatar [9.876680405587745]
We introduce an expressive hand avatar, named XHand, that is designed to generate hand shape, appearance, and deformations in real-time.
XHand is able to recover high-fidelity geometry and texture for hand animations across diverse poses in real-time.
arXiv Detail & Related papers (2024-07-30T17:49:21Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Hand-Centric Motion Refinement for 3D Hand-Object Interaction via
Hierarchical Spatial-Temporal Modeling [18.128376292350836]
We propose a data-driven method for coarse hand motion refinement.
First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects.
Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture.
arXiv Detail & Related papers (2024-01-29T09:17:51Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - Novel-view Synthesis and Pose Estimation for Hand-Object Interaction
from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views.
We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation.
During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z) - Dressing Avatars: Deep Photorealistic Appearance for Physically
Simulated Clothing [49.96406805006839]
We introduce pose-driven avatars with explicit modeling of clothing that exhibit both realistic clothing dynamics and photorealistic appearance learned from real-world data.
Our key contribution is a physically-inspired appearance network, capable of generating photorealistic appearance with view-dependent and dynamic shadowing effects even for unseen body-clothing configurations.
arXiv Detail & Related papers (2022-06-30T17:58:20Z) - NIMBLE: A Non-rigid Hand Model with Bones and Muscles [41.19718491215149]
We present NIMBLE, a novel parametric hand model that includes the missing key components.
NIMBLE consists of 20 bones as triangular meshes, 7 muscle groups as tetrahedral meshes, and a skin mesh.
We demonstrate applying NIMBLE to modeling, rendering, and visual inference tasks.
arXiv Detail & Related papers (2022-02-09T15:57:21Z) - Style and Pose Control for Image Synthesis of Humans from a Single
Monocular View [78.6284090004218]
StylePoseGAN is a non-controllable generator to accept conditioning of pose and appearance separately.
Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts.
StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics.
arXiv Detail & Related papers (2021-02-22T18:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.