Related papers: Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference

Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference

URL: http://arxiv.org/abs/2601.21269v1
Date: Thu, 29 Jan 2026 05:03:29 GMT
Title: Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference
Authors: Jianglong Li, Jun Xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song,
Abstract summary: Traditional 2D video compression techniques fail to preserve fine-grained and geometric appearance details.<n>We propose a lightweight, high-fidelity, low-bitrate 3D talking face compression framework that integrates FLAME-based parametric modeling with 3DGS neural rendering.
Score: 16.973019571440556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The demand for immersive and interactive communication has driven advancements in 3D video conferencing, yet achieving high-fidelity 3D talking face representation at low bitrates remains a challenge. Traditional 2D video compression techniques fail to preserve fine-grained geometric and appearance details, while implicit neural rendering methods like NeRF suffer from prohibitive computational costs. To address these challenges, we propose a lightweight, high-fidelity, low-bitrate 3D talking face compression framework that integrates FLAME-based parametric modeling with 3DGS neural rendering. Our approach transmits only essential facial metadata in real time, enabling efficient reconstruction with a Gaussian-based head model. Additionally, we introduce a compact representation and compression scheme, including Gaussian attribute compression and MLP optimization, to enhance transmission efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance, delivering high-quality facial rendering at extremely low bitrates, making it well-suited for real-time 3D video conferencing applications.

Related papers

CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting [57.73006852239138]
We present the first unified framework for rate-distortion-optimized compression and segmentation of 3D Gaussian Splatting (3DGS)<n>Inspired by recent advances in rate-distortion-optimized 3DGS compression, this work integrates semantic learning into the compression pipeline to support decoder-side applications.<n>Our scheme features a lightweight implicit neural representation-based hyperprior, enabling efficient entropy coding of both color and semantic attributes.
arXiv Detail & Related papers (2026-01-19T08:21:45Z)
4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming [52.76837132019501]
We introduce 4DGCPro, a novel hierarchical 4D compression framework.<n>4DGCPro facilitates real-time mobile decoding and high-quality rendering via progressive volumetric video streaming.<n>We present an end-to-end entropy-optimized training scheme.
arXiv Detail & Related papers (2025-09-22T08:38:17Z)
PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control [37.390794417927644]
We present PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS)<n>To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere.
arXiv Detail & Related papers (2025-09-21T05:01:54Z)
TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs [5.583906047971048]
3D Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed.<n>We introduce an efficient compression technique that significantly reduces storage overhead by using compact representation.<n> Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality.
arXiv Detail & Related papers (2025-01-06T21:37:30Z)
A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction [2.022451212187598]
In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. This paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS.
arXiv Detail & Related papers (2024-05-28T07:12:22Z)
Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [29.120669908374424]
We introduce a novel audio-driven talking head synthesis framework, called Talk3D. It can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses.
arXiv Detail & Related papers (2024-03-29T12:49:40Z)
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [71.73912454164834]
A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency. NeRF has become a popular technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video. We propose GeneFace++ to handle these challenges by utilizing the rendering pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process.
arXiv Detail & Related papers (2023-05-01T12:24:09Z)
Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition [61.6677901687009]
We propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits. Our method can generate realistic and audio-lips synchronized talking portrait videos.
arXiv Detail & Related papers (2022-11-22T16:03:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.