Interactive Face Video Coding: A Generative Compression Framework
- URL: http://arxiv.org/abs/2302.09919v1
- Date: Mon, 20 Feb 2023 11:24:23 GMT
- Title: Interactive Face Video Coding: A Generative Compression Framework
- Authors: Bolin Chen, Zhao Wang, Binzhe Li, Shurun Wang, Shiqi Wang, Yan Ye
- Abstract summary: We propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals.
The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression and headpose animation.
- Score: 18.26476468644723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel framework for Interactive Face Video Coding
(IFVC), which allows humans to interact with the intrinsic visual
representations instead of the signals. The proposed solution enjoys several
distinct advantages, including ultra-compact representation, low delay
interaction, and vivid expression and headpose animation. In particular, we
propose the Internal Dimension Increase (IDI) based representation, greatly
enhancing the fidelity and flexibility in rendering the appearance while
maintaining reasonable representation cost. By leveraging strong statistical
regularities, the visual signals can be effectively projected into controllable
semantics in the three dimensional space (e.g., mouth motion, eye blinking,
head rotation and head translation), which are compressed and transmitted. The
editable bitstream, which naturally supports the interactivity at the semantic
level, can synthesize the face frames via the strong inference ability of the
deep generative model. Experimental results have demonstrated the performance
superiority and application prospects of our proposed IFVC scheme. In
particular, the proposed scheme not only outperforms the state-of-the-art video
coding standard Versatile Video Coding (VVC) and the latest generative
compression schemes in terms of rate-distortion performance for face videos,
but also enables the interactive coding without introducing additional
manipulation processes. Furthermore, the proposed framework is expected to shed
lights on the future design of the digital human communication in the
metaverse.
Related papers
- Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization [13.341123726068652]
We propose a novel Multi-granularity Temporal Trajectory Factorization framework for generative human video compression.
Experimental results show that proposed method outperforms latest generative models and the state-of-the-art video coding standard Versatile Video Coding.
arXiv Detail & Related papers (2024-10-14T05:34:32Z) - Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens [28.03183316628635]
This paper proposes a novel Progressive Face Video Compression framework, namely PFVC, that utilizes adaptive visual tokens to realize exceptional trade-offs between reconstruction and bandwidth intelligence.
Experimental results demonstrate that the proposed PFVC framework can achieve better coding flexibility and superior rate-distortion performance in comparison with the latest Versatile Video Coding (VVC) and the state-of-the-art Generative Face Video Compression (GFVC) algorithms.
arXiv Detail & Related papers (2024-10-11T03:24:21Z) - Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters.
Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z) - Image Translation as Diffusion Visual Programmers [52.09889190442439]
Diffusion Visual Programmer (DVP) is a neuro-symbolic image translation framework.
Our framework seamlessly embeds a condition-flexible diffusion model within the GPT architecture.
Extensive experiments demonstrate DVP's remarkable performance, surpassing concurrent arts.
arXiv Detail & Related papers (2024-01-18T05:50:09Z) - Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based
Approach [15.838410034900138]
We envision a new metaverse communication paradigm for virtual avatar faces, and develop the semantic face compression with compact 3D facial descriptors.
The proposed scheme is expected to enable numerous applications, such as digital human communication based on machine analysis.
arXiv Detail & Related papers (2023-09-24T13:39:50Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Text-driven Video Prediction [83.04845684117835]
We propose a new task called Text-driven Video Prediction (TVP)
Taking the first frame and text caption as inputs, this task aims to synthesize the following frames.
To investigate the capability of text in causal inference for progressive motion information, our TVP framework contains a Text Inference Module (TIM)
arXiv Detail & Related papers (2022-10-06T12:43:07Z) - Towards Modality Transferable Visual Information Representation with
Optimal Model Compression [67.89885998586995]
We propose a new scheme for visual signal representation that leverages the philosophy of transferable modality.
The proposed framework is implemented on the state-of-the-art video coding standard.
arXiv Detail & Related papers (2020-08-13T01:52:40Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.