AgentAvatar: Disentangling Planning, Driving and Rendering for
Photorealistic Avatar Agents
- URL: http://arxiv.org/abs/2311.17465v3
- Date: Mon, 4 Dec 2023 16:49:18 GMT
- Title: AgentAvatar: Disentangling Planning, Driving and Rendering for
Photorealistic Avatar Agents
- Authors: Duomin Wang, Bin Dai, Yu Deng, Baoyuan Wang
- Abstract summary: Our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents' facial motions.
These descriptions are processed by our task-agnostic driving engine into continuous motion embeddings.
Our framework adapts to a variety of non-verbal avatar interactions, both monadic and dyadic.
- Score: 16.544688997764293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, our goal is to create interactive avatar agents that can
autonomously plan and animate nuanced facial movements realistically, from both
visual and behavioral perspectives. Given high-level inputs about the
environment and agent profile, our framework harnesses LLMs to produce a series
of detailed text descriptions of the avatar agents' facial motions. These
descriptions are then processed by our task-agnostic driving engine into motion
token sequences, which are subsequently converted into continuous motion
embeddings that are further consumed by our standalone neural-based renderer to
generate the final photorealistic avatar animations. These streamlined
processes allow our framework to adapt to a variety of non-verbal avatar
interactions, both monadic and dyadic. Our extensive study, which includes
experiments on both newly compiled and existing datasets featuring two types of
agents -- one capable of monadic interaction with the environment, and the
other designed for dyadic conversation -- validates the effectiveness and
versatility of our approach. To our knowledge, we advanced a leap step by
combining LLMs and neural rendering for generalized non-verbal prediction and
photo-realistic rendering of avatar agents.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - TEDRA: Text-based Editing of Dynamic and Photoreal Actors [59.480513384611804]
TEDRA is the first method allowing text-based edits of an avatar.
We train a model to create a controllable and high-fidelity digital replica of the real actor.
We modify the dynamic avatar based on a provided text prompt.
arXiv Detail & Related papers (2024-08-28T17:59:02Z) - AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [24.486705010561067]
The paper introduces AniTalker, a framework designed to generate lifelike talking faces from a single portrait.
AniTalker effectively captures a wide range of facial dynamics, including subtle expressions and head movements.
arXiv Detail & Related papers (2024-05-06T02:32:41Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video.
GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z) - Motion Planning on Visual Manifolds [0.0]
We propose an alternative characterization of the notion of Configuration Space, which we call Visual Configuration Space (VCS)
This new characterization allows an embodied agent (e.g., a robot) to discover its own body structure and plan obstacle-free motions in its peripersonal space using a set of its own images in random poses.
We demonstrate the usefulness of VCS in (a) building and working with geometry-free models for robot motion planning, (b) explaining how a human baby might learn to reach objects in its peripersonal space through motor babbling, and (c) automatically generating natural looking head
arXiv Detail & Related papers (2022-10-08T15:09:28Z) - Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance.
We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.