Related papers: Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field

Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field

URL: http://arxiv.org/abs/2506.22044v1
Date: Fri, 27 Jun 2025 09:42:30 GMT
Title: Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field
Authors: Hong Nie, Fuyuan Cao, Lu Chen, Fengxin Chen, Yuefeng Zou, Jun Yu,
Abstract summary: Reconstruction and rendering-based talking head synthesis methods achieve high-quality results with strong identity preservation but are limited by their dependence on identity-specific models.<n>We propose FIAG, a novel 3D speaking head synthesis framework that enables efficient identity-specific adaptation using only a few training footage.
Score: 15.145448983662636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstruction and rendering-based talking head synthesis methods achieve high-quality results with strong identity preservation but are limited by their dependence on identity-specific models. Each new identity requires training from scratch, incurring high computational costs and reduced scalability compared to generative model-based approaches. To overcome this limitation, we propose FIAG, a novel 3D speaking head synthesis framework that enables efficient identity-specific adaptation using only a few training footage. FIAG incorporates Global Gaussian Field, which supports the representation of multiple identities within a shared field, and Universal Motion Field, which captures the common motion dynamics across diverse identities. Benefiting from the shared facial structure information encoded in the Global Gaussian Field and the general motion priors learned in the motion field, our framework enables rapid adaptation from canonical identity representations to specific ones with minimal data. Extensive comparative and ablation experiments demonstrate that our method outperforms existing state-of-the-art approaches, validating both the effectiveness and generalizability of the proposed framework. Code is available at: \textit{https://github.com/gme-hong/FIAG}.

Related papers

CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z)
ID-Booth: Identity-consistent Face Generation with Diffusion Models [10.042492056152232]
We present a novel generative diffusion-based framework called ID-Booth.<n>The framework enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models.<n>Our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity.
arXiv Detail & Related papers (2025-04-10T02:20:18Z)
Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection [14.837853049121687]
3D object detection leveraging multi-view cameras has demonstrated their practical and economical value in challenging vision tasks. Typical supervised learning approaches face challenges in achieving satisfactory adaptation toward unseen and unlabeled target datasets. We propose Unified Domain Generalization and Adaptation (UDGA), a practical solution to mitigate those drawbacks.
arXiv Detail & Related papers (2024-10-29T18:51:49Z)
Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting. It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z)
UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders. We first develop an adaptive feature mask generator to account for the unique significance of nodes. We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z)
Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications. Virtual reality (VR) transmission over wireless networks is data- and computation-intensive. We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
You Only Train Once: Multi-Identity Free-Viewpoint Neural Human Rendering from Monocular Videos [10.795522875068073]
You Only Train Once (YOTO) is a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions. In this paper, we propose a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering. YOTO shows state-of-the-art performance on all evaluation metrics while showing significant benefits in training and inference efficiency as well as rendering quality.
arXiv Detail & Related papers (2023-03-10T10:23:17Z)
An Identity-Preserved Framework for Human Motion Transfer [3.6286856791379463]
Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion. Previous methods have achieved good results in good-quality videos, but lose sight of individualized motion information from the source and target motions. We propose a novel identity-preserved HMT network, termed textitIDPres.
arXiv Detail & Related papers (2022-04-14T10:27:19Z)
Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification. We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions. We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z)
IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.