Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference
- URL: http://arxiv.org/abs/2601.21269v1
- Date: Thu, 29 Jan 2026 05:03:29 GMT
- Title: Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference
- Authors: Jianglong Li, Jun Xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song,
- Abstract summary: Traditional 2D video compression techniques fail to preserve fine-grained and geometric appearance details.<n>We propose a lightweight, high-fidelity, low-bitrate 3D talking face compression framework that integrates FLAME-based parametric modeling with 3DGS neural rendering.
- Score: 16.973019571440556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The demand for immersive and interactive communication has driven advancements in 3D video conferencing, yet achieving high-fidelity 3D talking face representation at low bitrates remains a challenge. Traditional 2D video compression techniques fail to preserve fine-grained geometric and appearance details, while implicit neural rendering methods like NeRF suffer from prohibitive computational costs. To address these challenges, we propose a lightweight, high-fidelity, low-bitrate 3D talking face compression framework that integrates FLAME-based parametric modeling with 3DGS neural rendering. Our approach transmits only essential facial metadata in real time, enabling efficient reconstruction with a Gaussian-based head model. Additionally, we introduce a compact representation and compression scheme, including Gaussian attribute compression and MLP optimization, to enhance transmission efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance, delivering high-quality facial rendering at extremely low bitrates, making it well-suited for real-time 3D video conferencing applications.
Related papers
- CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting [57.73006852239138]
We present the first unified framework for rate-distortion-optimized compression and segmentation of 3D Gaussian Splatting (3DGS)<n>Inspired by recent advances in rate-distortion-optimized 3DGS compression, this work integrates semantic learning into the compression pipeline to support decoder-side applications.<n>Our scheme features a lightweight implicit neural representation-based hyperprior, enabling efficient entropy coding of both color and semantic attributes.
arXiv Detail & Related papers (2026-01-19T08:21:45Z) - 4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming [52.76837132019501]
We introduce 4DGCPro, a novel hierarchical 4D compression framework.<n>4DGCPro facilitates real-time mobile decoding and high-quality rendering via progressive volumetric video streaming.<n>We present an end-to-end entropy-optimized training scheme.
arXiv Detail & Related papers (2025-09-22T08:38:17Z) - PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control [37.390794417927644]
We present PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS)<n>To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere.
arXiv Detail & Related papers (2025-09-21T05:01:54Z) - TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs [5.583906047971048]
3D Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed.<n>We introduce an efficient compression technique that significantly reduces storage overhead by using compact representation.<n> Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality.
arXiv Detail & Related papers (2025-01-06T21:37:30Z) - A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction [2.022451212187598]
In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation.
3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions.
This paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction.
Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS.
arXiv Detail & Related papers (2024-05-28T07:12:22Z) - Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [29.120669908374424]
We introduce a novel audio-driven talking head synthesis framework, called Talk3D.
It can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior.
Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses.
arXiv Detail & Related papers (2024-03-29T12:49:40Z) - GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking
Face Generation [71.73912454164834]
A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency.
NeRF has become a popular technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.
We propose GeneFace++ to handle these challenges by utilizing the rendering pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process.
arXiv Detail & Related papers (2023-05-01T12:24:09Z) - Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial
Decomposition [61.6677901687009]
We propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits.
Our method can generate realistic and audio-lips synchronized talking portrait videos.
arXiv Detail & Related papers (2022-11-22T16:03:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.