Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
- URL: http://arxiv.org/abs/2305.03713v3
- Date: Mon, 5 Aug 2024 02:38:33 GMT
- Title: Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
- Authors: Ekta Prashnani, Koki Nagano, Shalini De Mello, David Luebke, Orazio Gallo,
- Abstract summary: We term this task avatar fingerprinting.
We first introduce a large-scale dataset of real and synthetic videos of people interacting on a video call.
We verify the identity driving the expressions in a synthetic video, by learning motion signatures that are independent of the facial appearance shown.
- Score: 16.717930760451996
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern avatar generators allow anyone to synthesize photorealistic real-time talking avatars, ushering in a new era of avatar-based human communication, such as with immersive AR/VR interactions or videoconferencing with limited bandwidths. Their safe adoption, however, requires a mechanism to verify if the rendered avatar is trustworthy: does it use the appearance of an individual without their consent? We term this task avatar fingerprinting. To tackle it, we first introduce a large-scale dataset of real and synthetic videos of people interacting on a video call, where the synthetic videos are generated using the facial appearance of one person and the expressions of another. We verify the identity driving the expressions in a synthetic video, by learning motion signatures that are independent of the facial appearance shown. Our solution, the first in this space, achieves an average AUC of 0.85. Critical to its practical use, it also generalizes to new generators never seen in training (average AUC of 0.83). The proposed dataset and other resources can be found at: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.
Related papers
- Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion [61.938480115119596]
We propose Zero-1-to-A, a robust method that synthesizes a spatial and temporal consistency dataset for 4D avatar reconstruction.
Experiments demonstrate that Zero-1-to-A improves fidelity, animation quality, and rendering speed compared to existing diffusion-based methods.
arXiv Detail & Related papers (2025-03-20T05:07:46Z) - Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior [31.780579293685797]
We present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos.
arXiv Detail & Related papers (2025-03-03T14:45:35Z) - EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - DEGAS: Detailed Expressions on Full-Body Gaussian Avatars [13.683836322899953]
We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions.
We propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars.
arXiv Detail & Related papers (2024-08-20T06:52:03Z) - HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - Physics-based Motion Retargeting from Sparse Inputs [73.94570049637717]
Commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose.
We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies.
We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available.
arXiv Detail & Related papers (2023-07-04T21:57:05Z) - READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [11.98034899127065]
We present READ Avatars, a 3D-based approach for generating 2D avatars driven by audio input with direct and granular control over the emotion.
Previous methods are unable to achieve realistic animation due to the many-to-many nature of audio to expression mappings.
This removes the smoothing effect of regression-based models and helps to improve the realism and expressiveness of the generated avatars.
arXiv Detail & Related papers (2023-03-01T18:56:43Z) - High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation [117.32310997522394]
3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR.
Existing person-specific 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar.
This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
arXiv Detail & Related papers (2021-03-29T18:33:49Z) - Expressive Telepresence via Modular Codec Avatars [148.212743312768]
VR telepresence consists of interacting with another human in a virtual space represented by an avatar.
This paper aims in this direction and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset.
MCA extends traditional Codec Avatars (CA) by replacing the holistic models with a learned modular representation.
arXiv Detail & Related papers (2020-08-26T20:16:43Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.