Related papers: IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion

IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion

URL: http://arxiv.org/abs/2501.07530v1
Date: Mon, 13 Jan 2025 18:08:27 GMT
Title: IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
Authors: Tharun Anand, Aryan Garg, Kaushik Mitra,
Abstract summary: Existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits.<n>We propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models.<n>Our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence.
Score: 12.494492016414503
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial video editing has become increasingly important for content creators, enabling the manipulation of facial expressions and attributes. However, existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits. Additionally, these models are often constrained to editing predefined facial attributes, limiting their flexibility to diverse editing prompts. To address these challenges, we propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models and fine-tune them specifically for facial video editing tasks. Our approach introduces a targeted fine-tuning scheme that enables high quality, localized, text-driven edits while ensuring identity preservation across video frames. Additionally, by using pre-trained T2I models during inference, our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence. We evaluate the effectiveness of our approach through extensive testing across a wide range of challenging scenarios, including varying head poses, complex action sequences, and diverse facial expressions. Our method consistently outperforms existing techniques, demonstrating superior performance across a broad set of metrics and benchmarks.

Related papers

UNIC: Unified In-Context Video Editing [76.76077875564526]
UNified In-Context Video Editing (UNIC) is a framework that unifies diverse video editing tasks within a single model in an in-context manner.<n>We introduce task-aware RoPE to facilitate consistent temporal positional encoding, and condition bias that enables the model to clearly differentiate different editing tasks.<n>Results demonstrate that our unified approach achieves superior performance on each task and exhibits emergent task composition abilities.
arXiv Detail & Related papers (2025-06-04T17:57:43Z)
Get In Video: Add Anything You Want to the Video [48.06070610416688]
Video editing increasingly demands the ability to incorporate specific real-world instances into existing footage. Current approaches fail to capture the unique visual characteristics of particular subjects and ensure natural instance/scene interactions. We introduce "Get-In-Video Editing", where users provide reference images to precisely specify visual elements they wish to incorporate into videos.
arXiv Detail & Related papers (2025-03-08T16:27:53Z)
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation [55.101611012677616]
Diffusion-based text-to-image (T2I) models have demonstrated remarkable results in global video editing tasks.<n>We present MAKIMA, a tuning-free MAE framework built upon pretrained T2I models for open-domain video editing.
arXiv Detail & Related papers (2024-12-28T02:36:51Z)
Learning Feature-Preserving Portrait Editing from Generated Pairs [11.122956539965761]
We propose a training-based method leveraging auto-generated paired data to learn desired editing. Our method achieves state-of-the-art quality, quantitatively and qualitatively.
arXiv Detail & Related papers (2024-07-29T23:19:42Z)
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing [57.76170824395532]
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video.<n>We propose COrrespondence-guided Video Editing (COVE) to achieve high-quality and consistent video editing.<n>COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization.
arXiv Detail & Related papers (2024-06-13T06:27:13Z)
Zero-Shot Video Editing through Adaptive Sliding Score Distillation [51.57440923362033]
This study proposes a novel paradigm of video-based score distillation, facilitating direct manipulation of original video content. We propose an Adaptive Sliding Score Distillation strategy, which incorporates both global and local video guidance to reduce the impact of editing errors.
arXiv Detail & Related papers (2024-06-07T12:33:59Z)
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model. Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts [116.05656635044357]
We propose a generic video editing framework called Make-A-Protagonist. Specifically, we leverage multiple experts to parse source video, target visual and textual clues, and propose a visual-textual-based video generation model. Results demonstrate the versatile and remarkable editing capabilities of Make-A-Protagonist.
arXiv Detail & Related papers (2023-05-15T17:59:03Z)
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding [35.18070525015657]
We propose a novel face video editing framework based on diffusion autoencoders. Our model is based on diffusion models and can satisfy both reconstruction and edit capabilities at the same time.
arXiv Detail & Related papers (2022-12-06T07:41:51Z)
A Latent Transformer for Disentangled and Identity-Preserving Face Editing [3.1542695050861544]
We propose to edit facial attributes via the latent space of a StyleGAN generator. We train a dedicated latent transformation network and incorporate explicit disentanglement and identity preservation terms in the loss function. Our model achieves a disentangled, controllable, and identity-preserving facial attribute editing, even in the challenging case of real (i.e., non-synthetic) images and videos.
arXiv Detail & Related papers (2021-06-22T16:04:30Z)
Task-agnostic Temporally Consistent Facial Video Editing [84.62351915301795]
We propose a task-agnostic, temporally consistent facial video editing framework. Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2020-07-03T02:49:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.