SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation
- URL: http://arxiv.org/abs/2512.17331v1
- Date: Fri, 19 Dec 2025 08:21:23 GMT
- Title: SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation
- Authors: Shihang Li, Zhiqiang Gong, Minming Ye, Yue Gao, Wen Yao,
- Abstract summary: We propose SynergyWarpNet, an attention-guided warping framework for talking head synthesis.<n>Our model progressively refines the animation in three stages.<n> Comprehensive evaluations on benchmark datasets demonstrate state-of-the-art performance.
- Score: 19.93753497685257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in neural portrait animation have demonstrated remarked potential for applications in virtual avatars, telepresence, and digital content creation. However, traditional explicit warping approaches often struggle with accurate motion transfer or recovering missing regions, while recent attention-based warping methods, though effective, frequently suffer from high complexity and weak geometric grounding. To address these issues, we propose SynergyWarpNet, an attention-guided cooperative warping framework designed for high-fidelity talking head synthesis. Given a source portrait, a driving image, and a set of reference images, our model progressively refines the animation in three stages. First, an explicit warping module performs coarse spatial alignment between the source and driving image using 3D dense optical flow. Next, a reference-augmented correction module leverages cross-attention across 3D keypoints and texture features from multiple reference images to semantically complete occluded or distorted regions. Finally, a confidence-guided fusion module integrates the warped outputs with spatially-adaptive fusing, using a learned confidence map to balance structural alignment and visual consistency. Comprehensive evaluations on benchmark datasets demonstrate state-of-the-art performance.
Related papers
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z) - Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement [6.91111219679588]
Blur2Sharp is a novel framework integrating 3D-aware neural rendering and diffusion models to generate sharp, geometrically consistent novel-view images.<n>Our method employs a dual-conditioning architecture: first, a Human NeRF model generates geometrically coherent multi-view renderings for target poses, explicitly encoding 3D structural guidance.<n>We further enhance visual quality through hierarchical feature fusion, incorporating texture, normal, and semantic priors extracted from parametric SMPL models to simultaneously improve global coherence and local detail accuracy.
arXiv Detail & Related papers (2025-12-09T03:49:12Z) - WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting [68.77882703764142]
3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis.<n>We introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN.
arXiv Detail & Related papers (2025-11-11T12:42:07Z) - A Controllable 3D Deepfake Generation Framework with Gaussian Splatting [6.969908558294805]
We propose a novel 3D deepfake generation framework based on 3D Gaussian Splatting.<n>It enables realistic, identity-preserving face swapping and reenactment in a fully controllable 3D space.<n>Our approach bridges the gap between 3D modeling and deepfake synthesis, enabling new directions for scene-aware, controllable, and immersive visual forgeries.
arXiv Detail & Related papers (2025-09-15T06:34:17Z) - Representing Animatable Avatar via Factorized Neural Fields [39.95141875395775]
Per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent equivalent to facilitate frame consistency.
We achieve a coherent preservation of both coarse body contours across the entire input video.
Our network surpasses the neural radiance fields (NeRF) based state-of-the-art methods in preserving high-frequency details and ensuring consistent body contours.
arXiv Detail & Related papers (2024-06-02T06:45:38Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars [40.10906393484584]
We propose a novel framework that enhances avatar reconstruction performance using an algorithm designed to increase the fidelity from multiple frames.
Our architecture emphasizes pixel-aligned image-to-image translation, mitigating the need to learn correspondences between observation and canonical spaces.
The proposed paradigm demonstrates state-of-the-art performance on one-shot and few-shot avatar animation tasks.
arXiv Detail & Related papers (2023-12-03T18:59:15Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.