Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
- URL: http://arxiv.org/abs/2306.02903v1
- Date: Mon, 5 Jun 2023 14:10:28 GMT
- Title: Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
- Authors: Shaoxu Li
- Abstract summary: Given a short monocular RGB video and text instructions, our method uses an image-conditioned diffusion model to edit one head image.
Our method synthesizes edited photo-realistic animatable 3D neural head avatars with a deformable neural radiance field head synthesis method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for synthesizing edited photo-realistic digital avatars
with text instructions. Given a short monocular RGB video and text
instructions, our method uses an image-conditioned diffusion model to edit one
head image and uses the video stylization method to accomplish the editing of
other head images. Through iterative training and update (three times or more),
our method synthesizes edited photo-realistic animatable 3D neural head avatars
with a deformable neural radiance field head synthesis method. In quantitative
and qualitative studies on various subjects, our method outperforms
state-of-the-art methods.
Related papers
- Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection [60.47731445033151]
We propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model.
Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.
arXiv Detail & Related papers (2024-05-27T04:44:36Z) - GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image [89.70322127648349]
We propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars.
To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field.
arXiv Detail & Related papers (2024-04-02T17:58:35Z) - Text-Guided Generation and Editing of Compositional 3D Avatars [59.584042376006316]
Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description.
Existing methods either lack realism, produce unrealistic shapes, or do not support editing.
arXiv Detail & Related papers (2023-09-13T17:59:56Z) - OPHAvatars: One-shot Photo-realistic Head Avatars [0.0]
Given a portrait, our method synthesizes a coarse talking head video using driving keypoints features.
With rendered images of the coarse avatar, our method updates the low-quality images with a blind face restoration model.
After several iterations, our method can synthesize a photo-realistic animatable 3D neural head avatar.
arXiv Detail & Related papers (2023-07-18T11:24:42Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - HeadSculpt: Crafting 3D Head Avatars with Text [143.14548696613886]
We introduce a versatile pipeline dubbed HeadSculpt for crafting 3D head avatars from textual prompts.
We first equip the diffusion model with 3D awareness by leveraging landmark-based control and a learned textual embedding.
We propose a novel identity-aware editing score distillation strategy to optimize a textured mesh with a high-resolution differentiable rendering technique.
arXiv Detail & Related papers (2023-06-05T16:53:58Z) - AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars [84.85009267371218]
We propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar.
Our approach builds on existing work to capture dynamic performances of human heads using neural field (NeRF) and edits this representation with a text-to-image diffusion model.
Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network.
arXiv Detail & Related papers (2023-06-01T11:06:01Z) - I M Avatar: Implicit Morphable Head Avatars from Videos [68.13409777995392]
We propose IMavatar, a novel method for learning implicit head avatars from monocular videos.
Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose-related deformations via learned blendshapes and skinning fields.
We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-14T15:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.