Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding
- URL: http://arxiv.org/abs/2506.07369v1
- Date: Mon, 09 Jun 2025 02:39:24 GMT
- Title: Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding
- Authors: Bolin Chen, Shanzhi Yin, Goluck Konuko, Giuseppe Valenzise, Zihan Zhang, Shiqi Wang, Yan Ye,
- Abstract summary: Generative Face Video Coding (GFVC) stands at the forefront of this revolution.<n>GFVC could characterize complex facial dynamics into compact latent codes for bitstream compactness at the encoder side.<n>This paper presents the first comprehensive survey of GFVC technologies.
- Score: 28.103355729714014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of deep generative models has greatly advanced video compression, reshaping the paradigm of face video coding through their powerful capability for semantic-aware representation and lifelike synthesis. Generative Face Video Coding (GFVC) stands at the forefront of this revolution, which could characterize complex facial dynamics into compact latent codes for bitstream compactness at the encoder side and leverages powerful deep generative models to reconstruct high-fidelity face signal from the compressed latent codes at the decoder side. As such, this well-designed GFVC paradigm could enable high-fidelity face video communication at ultra-low bitrate ranges, far surpassing the capabilities of the latest Versatile Video Coding (VVC) standard. To pioneer foundational research and accelerate the evolution of GFVC, this paper presents the first comprehensive survey of GFVC technologies, systematically bridging critical gaps between theoretical innovation and industrial standardization. In particular, we first review a broad range of existing GFVC methods with different feature representations and optimization strategies, and conduct a thorough benchmarking analysis. In addition, we construct a large-scale GFVC-compressed face video database with subjective Mean Opinion Scores (MOSs) based on human perception, aiming to identify the most appropriate quality metrics tailored to GFVC. Moreover, we summarize the GFVC standardization potentials with a unified high-level syntax and develop a low-complexity GFVC system which are both expected to push forward future practical deployments and applications. Finally, we envision the potential of GFVC in industrial applications and deliberate on the current challenges and future opportunities.
Related papers
- Controllable Video Generation: A Survey [72.38313362192784]
We provide a systematic review of controllable video generation, covering both theoretical foundations and recent advances in the field.<n>We begin by introducing the key concepts and commonly used open-source video generation models.<n>We then focus on control mechanisms in video diffusion models, analyzing how different types of conditions can be incorporated into the denoising process to guide generation.
arXiv Detail & Related papers (2025-07-22T06:05:34Z) - REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder [52.698595889988766]
We present a novel perspective on learning video embedders for generative modeling.<n>Rather than requiring an exact reproduction of an input video, an effective embedder should focus on visually plausible reconstructions.<n>We propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework.
arXiv Detail & Related papers (2025-03-11T17:51:07Z) - Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence [19.137109044483545]
Pleno-Generation (PGen) framework prioritizes high-fidelity reconstruction over pursuing compact bitstream.<n>We show that the proposed framework can allow a greater space of flexibility for coding applications.<n>In comparison with the latest Versatile Video Coding (VVC), the proposed scheme achieves competitive Bjontegaard-delta-rate savings.
arXiv Detail & Related papers (2025-02-24T12:03:30Z) - Standardizing Generative Face Video Compression using Supplemental Enhancement Information [22.00903915523654]
This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI)<n>At the time of writing, the proposed GFVC approach using SEI messages has been adopted into the official working draft of Versatile Supplemental Enhancement Information (VSEI) standard.<n>To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression.
arXiv Detail & Related papers (2024-10-19T13:37:24Z) - Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens [28.03183316628635]
This paper proposes a novel Progressive Face Video Compression framework, namely PFVC, that utilizes adaptive visual tokens to realize exceptional trade-offs between reconstruction and bandwidth intelligence.
Experimental results demonstrate that the proposed PFVC framework can achieve better coding flexibility and superior rate-distortion performance in comparison with the latest Versatile Video Coding (VVC) and the state-of-the-art Generative Face Video Compression (GFVC) algorithms.
arXiv Detail & Related papers (2024-10-11T03:24:21Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Generative Face Video Coding Techniques and Standardization Efforts: A
Review [17.856692220227583]
Generative Face Video Coding (GFVC) techniques can achieve high-quality face video communication in ultra-low bandwidth scenarios.
This paper conducts a comprehensive survey on the recent advances of the GFVC techniques and standardization efforts.
arXiv Detail & Related papers (2023-11-05T13:32:51Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.