Generative Face Video Coding Techniques and Standardization Efforts: A
Review
- URL: http://arxiv.org/abs/2311.02649v1
- Date: Sun, 5 Nov 2023 13:32:51 GMT
- Title: Generative Face Video Coding Techniques and Standardization Efforts: A
Review
- Authors: Bolin Chen, Jie Chen, Shiqi Wang, Yan Ye
- Abstract summary: Generative Face Video Coding (GFVC) techniques can achieve high-quality face video communication in ultra-low bandwidth scenarios.
This paper conducts a comprehensive survey on the recent advances of the GFVC techniques and standardization efforts.
- Score: 17.856692220227583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Face Video Coding (GFVC) techniques can exploit the compact
representation of facial priors and the strong inference capability of deep
generative models, achieving high-quality face video communication in ultra-low
bandwidth scenarios. This paper conducts a comprehensive survey on the recent
advances of the GFVC techniques and standardization efforts, which could be
applicable to ultra low bitrate communication, user-specified
animation/filtering and metaverse-related functionalities. In particular, we
generalize GFVC systems within one coding framework and summarize different
GFVC algorithms with their corresponding visual representations. Moreover, we
review the GFVC standardization activities that are specified with supplemental
enhancement information messages. Finally, we discuss fundamental challenges
and broad applications on GFVC techniques and their standardization potentials,
as well as envision their future trends. The project page can be found at
https://github.com/Berlin0610/Awesome-Generative-Face-Video-Coding.
Related papers
- Standardizing Generative Face Video Compression using Supplemental Enhancement Information [22.00903915523654]
This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI)
At the time of writing, the proposed GFVC approach is an official "technology under consideration" (TuC) for standardization by the Joint Video Experts Team (JVET)
To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression.
arXiv Detail & Related papers (2024-10-19T13:37:24Z) - Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens [28.03183316628635]
This paper proposes a novel Progressive Face Video Compression framework, namely PFVC, that utilizes adaptive visual tokens to realize exceptional trade-offs between reconstruction and bandwidth intelligence.
Experimental results demonstrate that the proposed PFVC framework can achieve better coding flexibility and superior rate-distortion performance in comparison with the latest Versatile Video Coding (VVC) and the state-of-the-art Generative Face Video Compression (GFVC) algorithms.
arXiv Detail & Related papers (2024-10-11T03:24:21Z) - Live Video Captioning [0.6291443816903801]
We introduce a paradigm shift towards Live Video Captioning (LVC)
In LVC, dense video captioning models must generate captions for video streams in an online manner.
We propose new evaluation metrics tailored for the online scenario, demonstrating their superiority over traditional metrics.
arXiv Detail & Related papers (2024-06-20T11:25:16Z) - iVideoGPT: Interactive VideoGPTs are Scalable World Models [70.02290687442624]
World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making.
This work introduces Interactive VideoGPT, a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens.
iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations.
arXiv Detail & Related papers (2024-05-24T05:29:12Z) - VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding [63.075626670943116]
We introduce a cutting-edge framework, VaQuitA, designed to refine the synergy between video and textual information.
At the data level, instead of sampling frames uniformly, we implement a sampling method guided by CLIP-score rankings.
At the feature level, we integrate a trainable Video Perceiver alongside a Visual-Query Transformer.
arXiv Detail & Related papers (2023-12-04T19:48:02Z) - Delving into Multimodal Prompting for Fine-grained Visual Classification [57.12570556836394]
Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category.
Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks.
We propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image subcategory (CLIP) model.
arXiv Detail & Related papers (2023-09-16T07:30:52Z) - Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning [50.60891619269651]
Control-A-Video is a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps.
We propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process.
Our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation.
arXiv Detail & Related papers (2023-05-23T09:03:19Z) - Perceptual Quality Assessment of Face Video Compression: A Benchmark and
An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs.
The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA)
We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z) - Interactive Face Video Coding: A Generative Compression Framework [18.26476468644723]
We propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals.
The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression and headpose animation.
arXiv Detail & Related papers (2023-02-20T11:24:23Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.