Related papers: From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors

From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors

URL: http://arxiv.org/abs/2602.06122v1
Date: Thu, 05 Feb 2026 19:00:50 GMT
Title: From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors
Authors: Ding-Jiun Huang, Yuanhao Wang, Shao-Ji Yuan, Albert Mosella-Montoro, Francisco Vicente Carrasco, Cheng Zhang, Fernando De la Torre,
Abstract summary: We introduce SuperHead, a framework for enhancing low-resolution, animatable 3D head avatars.<n>SuperHead synthesizes high-quality geometry and textures, while ensuring both 3D and temporal consistency.<n>Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions.
Score: 49.37666175170832
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating high-fidelity, animatable 3D talking heads is crucial for immersive applications, yet often hindered by the prevalence of low-quality image or video sources, which yield poor 3D reconstructions. In this paper, we introduce SuperHead, a novel framework for enhancing low-resolution, animatable 3D head avatars. The core challenge lies in synthesizing high-quality geometry and textures, while ensuring both 3D and temporal consistency during animation and preserving subject identity. Despite recent progress in image, video and 3D-based super-resolution (SR), existing SR techniques are ill-equipped to handle dynamic 3D inputs. To address this, SuperHead leverages the rich priors from pre-trained 3D generative models via a novel dynamics-aware 3D inversion scheme. This process optimizes the latent representation of the generative model to produce a super-resolved 3D Gaussian Splatting (3DGS) head model, which is subsequently rigged to an underlying parametric head model (e.g., FLAME) for animation. The inversion is jointly supervised using a sparse collection of upscaled 2D face renderings and corresponding depth maps, captured from diverse facial expressions and camera viewpoints, to ensure realism under dynamic facial motions. Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions, significantly outperforming baseline methods in visual quality.

Related papers

Generalizable and Animatable 3D Full-Head Gaussian Avatar from a Single Image [9.505520774467263]
Building 3D animatable head avatars from a single image is an important yet challenging problem.<n>Existing methods generally collapse under large camera pose variations, compromising the realism of 3D avatars.<n>We propose a new framework to tackle the novel setting of one-shot 3D full-head animatable avatar reconstruction in a single feed-forward pass.
arXiv Detail & Related papers (2026-01-19T06:56:58Z)
TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z)
Generating Editable Head Avatars with 3D Gaussian GANs [57.51487984425395]
Traditional 3D-aware generative adversarial networks (GANs) achieve photorealistic and view-consistent 3D head synthesis.<n>We propose a novel approach that enhances the editability and animation control of 3D head avatars by incorporating 3D Gaussian Splatting (3DGS) as an explicit 3D representation.<n>Our approach delivers high-quality 3D-aware synthesis with state-of-the-art controllability.
arXiv Detail & Related papers (2024-12-26T10:10:03Z)
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation. Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z)
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio. We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model. Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z)
Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models [107.84324544272481]
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task.
arXiv Detail & Related papers (2023-07-10T19:15:32Z)
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$ [17.355141949293852]
Existing 3D generative adversarial networks (GANs) for 3D human head synthesis are either limited to near-frontal views or hard to preserve 3D consistency in large view angles. We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in $360circ$ with diverse appearance and detailed geometry.
arXiv Detail & Related papers (2023-03-23T06:54:34Z)
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly. We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.