Related papers: TEDRA: Text-based Editing of Dynamic and Photoreal Actors

TEDRA: Text-based Editing of Dynamic and Photoreal Actors

URL: http://arxiv.org/abs/2408.15995v1
Date: Wed, 28 Aug 2024 17:59:02 GMT
Title: TEDRA: Text-based Editing of Dynamic and Photoreal Actors
Authors: Basavaraj Sunagad, Heming Zhu, Mohit Mendiratta, Adam Kortylewski, Christian Theobalt, Marc Habermann,
Abstract summary: TEDRA is the first method allowing text-based edits of an avatar. We train a model to create a controllable and high-fidelity digital replica of the real actor. We modify the dynamic avatar based on a provided text prompt.
Score: 59.480513384611804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the past years, significant progress has been made in creating photorealistic and drivable 3D avatars solely from videos of real humans. However, a core remaining challenge is the fine-grained and user-friendly editing of clothing styles by means of textual descriptions. To this end, we present TEDRA, the first method allowing text-based edits of an avatar, which maintains the avatar's high fidelity, space-time coherency, as well as dynamics, and enables skeletal pose and view control. We begin by training a model to create a controllable and high-fidelity digital replica of the real actor. Next, we personalize a pretrained generative diffusion model by fine-tuning it on various frames of the real character captured from different camera angles, ensuring the digital representation faithfully captures the dynamics and movements of the real person. This two-stage process lays the foundation for our approach to dynamic human avatar editing. Utilizing this personalized diffusion model, we modify the dynamic avatar based on a provided text prompt using our Personalized Normal Aligned Score Distillation Sampling (PNA-SDS) within a model-based guidance framework. Additionally, we propose a time step annealing strategy to ensure high-quality edits. Our results demonstrate a clear improvement over prior work in functionality and visual quality.

Related papers

SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents [91.26239311240873]
SmartAvatar is a vision-language-agent-driven framework for generating fully rigged, animation-ready 3D human avatars.<n>A key innovation is an autonomous verification loop, where the agent renders draft avatars.<n>The generated avatars are fully rigged and support pose manipulation with consistent identity and appearance.
arXiv Detail & Related papers (2025-06-05T03:49:01Z)
EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models [4.851981427563145]
We present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips. Our approach enables precise editing through a novel prompt-based editing model.
arXiv Detail & Related papers (2024-11-24T06:22:30Z)
EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video. Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z)
MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space [25.24509617548819]
We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts. Key innovations are aimed at overcoming the challenges in photo-realistic avatar synthesis.
arXiv Detail & Related papers (2024-04-01T17:59:11Z)
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z)
AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars [84.85009267371218]
We propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using neural field (NeRF) and edits this representation with a text-to-image diffusion model. Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network.
arXiv Detail & Related papers (2023-06-01T11:06:01Z)
Text-Conditional Contextualized Avatars For Zero-Shot Personalization [47.85747039373798]
We propose a pipeline that enables personalization of image generation with avatars capturing a user's identity in a delightful way. Our pipeline is zero-shot, avatar texture and style agnostic, and does not require training on the avatar at all. We show, for the first time, how to leverage large-scale image datasets to learn human 3D pose parameters.
arXiv Detail & Related papers (2023-04-14T22:00:44Z)
Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance. We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z)
Dynamic Neural Garments [45.833166320896716]
We present a solution that takes in body joint motion to directly produce realistic dynamic garment image sequences. Specifically, given the target joint motion sequence of an avatar, we propose dynamic neural garments to jointly simulate and render plausible dynamic garment appearance.
arXiv Detail & Related papers (2021-02-23T17:21:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.