Related papers: Low-Rank Head Avatar Personalization with Registers

Low-Rank Head Avatar Personalization with Registers

URL: http://arxiv.org/abs/2506.01935v1
Date: Mon, 02 Jun 2025 17:53:14 GMT
Title: Low-Rank Head Avatar Personalization with Registers
Authors: Sai Tanmay Reddy Chakkera, Aggelina Chatziagapi, Md Moniruzzaman, Chen-Ping Yu, Yi-Hsuan Tsai, Dimitris Samaras,
Abstract summary: We introduce a novel method for low-rank personalization of a generic model for head avatar generation.<n>Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively.
Score: 36.7667914190956
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce a novel method for low-rank personalization of a generic model for head avatar generation. Prior work proposes generic models that achieve high-quality face animation by leveraging large-scale datasets of multiple identities. However, such generic models usually fail to synthesize unique identity-specific details, since they learn a general domain prior. To adapt to specific subjects, we find that it is still challenging to capture high-frequency facial details via popular solutions like low-rank adaptation (LoRA). This motivates us to propose a specific architecture, a Register Module, that enhances the performance of LoRA, while requiring only a small number of parameters to adapt to an unseen identity. Our module is applied to intermediate features of a pre-trained model, storing and re-purposing information in a learnable 3D feature space. To demonstrate the efficacy of our personalization method, we collect a dataset of talking videos of individuals with distinctive facial details, such as wrinkles and tattoos. Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively. We will release the code, models, and dataset to the public.

Related papers

Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features [54.63343151319368]
This paper proposes a harmless model ownership verification method for personalized models by decoupling similar common features.<n>In the first stage, we create shadow models that retain common features of the victim model while disrupting dataset-specific features.<n>After that, a meta-classifier is trained to identify stolen models by determining whether suspicious models contain the dataset-specific features of the victim.
arXiv Detail & Related papers (2025-06-24T15:40:11Z)
Multi-subject Open-set Personalization in Video Generation [110.02124633005516]
We present Video Alchemist $-$ a video model with built-in multi-subject, open-set personalization capabilities.<n>Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt.<n>Our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-01-10T18:59:54Z)
Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge [33.35678923549471]
textbfFreeCure is a framework that improves the prompt consistency of personalization models.<n>We introduce a novel foundation-aware self-attention module, coupled with an inversion-based process to bring well-aligned attribute information to the personalization process.<n>FreeCure has demonstrated significant improvements in prompt consistency across a diverse set of state-of-the-art facial personalization models.
arXiv Detail & Related papers (2024-11-22T15:21:38Z)
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes [74.82911268630463]
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. MimicTalk exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG. Experiments show that our MimicTalk surpasses previous baselines regarding video quality, efficiency, and expressiveness.
arXiv Detail & Related papers (2024-10-09T10:12:37Z)
HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors [24.245586597913082]
We present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness.<n>We propose a framework comprising prior learning and avatar creation phases.<n>Our model effectively exploits head priors and successfully generalizes them to few-shot personalization.
arXiv Detail & Related papers (2024-08-12T09:19:38Z)
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation [19.911068375240905]
Myportrait is a simple, general, and flexible framework for neural portrait generation. Our proposed framework supports both video-driven and audio-driven face animation. Our method provides a real-time online version and a high-quality offline version.
arXiv Detail & Related papers (2023-12-05T12:05:01Z)
Generate Anything Anywhere in Any Scene [25.75076439397536]
We propose a controllable text-to-image diffusion model for personalized object generation. Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.
arXiv Detail & Related papers (2023-06-29T17:55:14Z)
Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image. We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)
Thinking the Fusion Strategy of Multi-reference Face Reenactment [4.1509697008011175]
We show that simple extension by using multiple reference images significantly improves generation quality. We show this by 1) conducting the reconstruction task on publicly available dataset, 2) conducting facial motion transfer on our original dataset which consists of multi-person's head movement video sequences, and 3) using a newly proposed evaluation metric to validate that our method achieves better quantitative results.
arXiv Detail & Related papers (2022-02-22T09:17:26Z)
PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs. Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.