Related papers: AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

URL: http://arxiv.org/abs/2306.00547v2
Date: Fri, 2 Jun 2023 08:45:09 GMT
Title: AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Authors: Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt
Abstract summary: We propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using neural field (NeRF) and edits this representation with a text-to-image diffusion model. Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network.
Score: 84.85009267371218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs and others. While these data modalities provide effective means of control, they mostly focus on editing the head movements such as the facial expressions, head pose and/or camera viewpoint. In this paper, we propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using neural radiance field (NeRF) and edits this representation with a text-to-image diffusion model. Specifically, we introduce an optimization strategy for incorporating multiple keyframes representing different camera viewpoints and time stamps of a video performance into a single diffusion model. Using this personalized diffusion model, we edit the dynamic NeRF by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) following a model-based guidance approach. Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network. We evaluate our method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine, personalized, as well as 3D- and time-consistent.

Related papers

EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos [33.12653115668027]
Our method generates Multiplane Images (MPIs) that ensure geometric consistency. Our approach directly generates the final output through a single denoising process. To effectively learn from monocular videos, we introduce a training mechanism that reconstructs the output MPI randomly in either the target or the reference camera space.
arXiv Detail & Related papers (2025-04-27T08:56:02Z)
DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models [4.851981427563145]
We present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips. Our approach enables precise editing through a novel prompt-based editing model.
arXiv Detail & Related papers (2024-11-24T06:22:30Z)
Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals. We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions. Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
TEDRA: Text-based Editing of Dynamic and Photoreal Actors [59.480513384611804]
TEDRA is the first method allowing text-based edits of an avatar. We train a model to create a controllable and high-fidelity digital replica of the real actor. We modify the dynamic avatar based on a provided text prompt.
arXiv Detail & Related papers (2024-08-28T17:59:02Z)
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image [89.70322127648349]
We propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field.
arXiv Detail & Related papers (2024-04-02T17:58:35Z)
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation [31.310769289315648]
This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user. We learn a generative model for 3D animatable photo-realistic head avatar from a multi-view dataset of expressions from 2407 subjects. Our method demonstrates compelling results and outperforms existing state-of-the-art methods for few-shot avatar adaptation.
arXiv Detail & Related papers (2024-02-19T07:48:29Z)
Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions [0.0]
Given a short monocular RGB video and text instructions, our method uses an image-conditioned diffusion model to edit one head image. Our method synthesizes edited photo-realistic animatable 3D neural head avatars with a deformable neural radiance field head synthesis method.
arXiv Detail & Related papers (2023-06-05T14:10:28Z)
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models. The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.