Digital Avatars: Framework Development and Their Evaluation
- URL: http://arxiv.org/abs/2408.04068v1
- Date: Wed, 7 Aug 2024 20:09:47 GMT
- Title: Digital Avatars: Framework Development and Their Evaluation
- Authors: Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang,
- Abstract summary: We present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts.
We propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars.
Our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines.
- Score: 26.74934835511383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visualize the responses of our LLM, and the effectiveness of our prompting strategy we propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars. This pipeline effectively captures an individual's essence for interaction and our streaming algorithm delivers a high-quality digital avatar with real-time audio-video streaming from server to mobile device. Both our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines. In the case of our Donald Trump and Joe Biden avatars, their authenticity and favorability are rated higher than even their real-world equivalents.
Related papers
- Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos [12.12643642515884]
An attacker can steal a user's avatar, preserving his appearance and voice, making it nearly impossible to detect its usage by sight or sound alone.<n>Our main question is whether an individual's facial motion patterns can serve as reliable behavioral biometrics to verify their identity when the avatar's visual appearance is a facsimile of its owner.<n> Experimental results demonstrate that facial motion landmarks enable meaningful identity verification with AUC values approaching 80%.
arXiv Detail & Related papers (2025-08-01T16:23:27Z) - VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis [70.76837748695841]
We propose VisualSpeaker, a novel method that bridges the gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation.<n>Our contribution is a perceptual lip-reading loss, derived by passing 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training.<n> Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation.
arXiv Detail & Related papers (2025-07-08T15:04:17Z) - SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents [91.26239311240873]
SmartAvatar is a vision-language-agent-driven framework for generating fully rigged, animation-ready 3D human avatars.<n>A key innovation is an autonomous verification loop, where the agent renders draft avatars.<n>The generated avatars are fully rigged and support pose manipulation with consistent identity and appearance.
arXiv Detail & Related papers (2025-06-05T03:49:01Z) - A multidimensional measurement of photorealistic avatar quality of experience [14.94879852506943]
Photorealistic avatars are human avatars that look, move, and talk like real people.
We provide an open source test framework to subjectively measure photorealistic avatar performance in ten dimensions.
We show that the correlation of nine of these subjective metrics with PSNR, SSIM, LPIPS, FID, and FVD is weak, and moderate for emotion accuracy.
arXiv Detail & Related papers (2024-11-13T22:47:24Z) - EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - Traceable AI-driven Avatars Using Multi-factors of Physical World and Metaverse [7.436039179584676]
Metaverse allows users to delegate their AI models to an AI engine, which builds corresponding AI-driven avatars.
In this paper, we propose an authentication method using multi-factors to guarantee the traceability of AI-driven avatars.
arXiv Detail & Related papers (2024-08-30T09:04:11Z) - TEDRA: Text-based Editing of Dynamic and Photoreal Actors [59.480513384611804]
TEDRA is the first method allowing text-based edits of an avatar.
We train a model to create a controllable and high-fidelity digital replica of the real actor.
We modify the dynamic avatar based on a provided text prompt.
arXiv Detail & Related papers (2024-08-28T17:59:02Z) - X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation [63.74194950823133]
X-Oscar is a progressive framework for generating high-quality animatable avatars from text prompts.
To tackle oversaturation, we introduce Adaptive Variational, representing avatars as an adaptive distribution during training.
We also present Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into rendered images.
arXiv Detail & Related papers (2024-05-02T02:30:39Z) - MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space [25.24509617548819]
We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts.
Key innovations are aimed at overcoming the challenges in photo-realistic avatar synthesis.
arXiv Detail & Related papers (2024-04-01T17:59:11Z) - Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [66.43223397997559]
We aim to synthesize high-quality talking portrait videos corresponding to the input text.
This task has broad application prospects in the digital human industry but has not been technically achieved yet.
We introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which designs a generic zero-shot multi-speaker Text-to-Speech model.
arXiv Detail & Related papers (2023-06-06T08:50:13Z) - OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane
Rendering [81.55960827071661]
Controllability, generalizability and efficiency are the major objectives of constructing face avatars represented by neural implicit field.
We propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution.
arXiv Detail & Related papers (2023-03-26T09:12:03Z) - SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character
on Arbitrary Avatar Engines [34.645129752596915]
We propose SwiftAvatar, a novel avatar auto-creation framework.
We synthesize data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces.
Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines.
arXiv Detail & Related papers (2023-01-19T16:14:28Z) - AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging [12.535634029277212]
We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars.
Our results achieve much higher preference scores than previous work and close to those of manual creation.
arXiv Detail & Related papers (2022-11-15T00:43:45Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.