Related papers: DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models

DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models

URL: http://arxiv.org/abs/2411.15732v1
Date: Sun, 24 Nov 2024 06:22:30 GMT
Title: DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models
Authors: Yangyang Qian, Yuan Sun, Yu Guo,
Abstract summary: We present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips. Our approach enables precise editing through a novel prompt-based editing model.
Score: 4.851981427563145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating and editing dynamic 3D head avatars are crucial tasks in virtual reality and film production. However, existing methods often suffer from facial distortions, inaccurate head movements, and limited fine-grained editing capabilities. To address these challenges, we present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips and parameters associated with facial positions and expressions. Our approach enables precise editing through a novel prompt-based editing model, which integrates user-provided prompts with guiding parameters derived from large language models (LLMs). To achieve this, we propose a dual-tracking framework based on Gaussian Splatting and introduce a prompt preprocessing module to enhance editing stability. By incorporating a specialized GAN algorithm and connecting it to our control module, which generates precise guiding parameters from LLMs, we successfully address the limitations of existing methods. Additionally, we develop a dynamic editing strategy that selectively utilizes specific training datasets to improve the efficiency and adaptability of the model for dynamic editing tasks.

Related papers

Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation [25.047474784265773]
Instruct-4DGS is an efficient dynamic scene editing method that is more scalable in terms of temporal dimension. editing results demonstrate that Instruct-4DGS is efficient, reducing editing time by more than half compared to existing methods.
arXiv Detail & Related papers (2025-02-04T08:18:49Z)
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery [10.594261300488546]
We introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS. PIXELS provides granular control over edits, allowing adjustments at the pixel or region level. We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation.
arXiv Detail & Related papers (2025-01-16T20:26:30Z)
BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm. We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model. Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z)
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion [13.744253074367885]
We introduce a novel framework that first fine-tunes the InstructPix2Pix model, followed by a two-stage optimization of the scene. Our approach enables consistent and precise local edits without the need for tracking desired editing regions. Compared to state-of-the-art methods, our approach offers more flexible and controllable local scene editing.
arXiv Detail & Related papers (2024-12-02T18:38:51Z)
TEDRA: Text-based Editing of Dynamic and Photoreal Actors [59.480513384611804]
TEDRA is the first method allowing text-based edits of an avatar. We train a model to create a controllable and high-fidelity digital replica of the real actor. We modify the dynamic avatar based on a provided text prompt.
arXiv Detail & Related papers (2024-08-28T17:59:02Z)
Preserving Identity with Variational Score for General-purpose 3D Editing [48.314327790451856]
Piva is a novel optimization-based method for editing images and 3D models based on diffusion models. We pinpoint the limitations in 2D and 3D editing, which causes detail loss and oversaturation. We propose an additional score distillation term that enforces identity preservation.
arXiv Detail & Related papers (2024-06-13T09:32:40Z)
DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z)
SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields [7.678022563694719]
SealD-NeRF is an extension of Seal-3D for pixel-level editing in dynamic settings. It allows for consistent edits across sequences by mapping editing actions to a specific timeframe.
arXiv Detail & Related papers (2024-02-21T03:45:18Z)
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z)
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [40.71940056121056]
We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
arXiv Detail & Related papers (2023-12-03T14:17:11Z)
AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars [84.85009267371218]
We propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using neural field (NeRF) and edits this representation with a text-to-image diffusion model. Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network.
arXiv Detail & Related papers (2023-06-01T11:06:01Z)
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models. The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.