Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based
Approach
- URL: http://arxiv.org/abs/2311.12817v1
- Date: Sun, 24 Sep 2023 13:39:50 GMT
- Title: Semantic Face Compression for Metaverse: A Compact 3D Descriptor Based
Approach
- Authors: Binzhe Li, Bolin Chen, Zhao Wang, Shiqi Wang, Yan Ye
- Abstract summary: We envision a new metaverse communication paradigm for virtual avatar faces, and develop the semantic face compression with compact 3D facial descriptors.
The proposed scheme is expected to enable numerous applications, such as digital human communication based on machine analysis.
- Score: 15.838410034900138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this letter, we envision a new metaverse communication paradigm for
virtual avatar faces, and develop the semantic face compression with compact 3D
facial descriptors. The fundamental principle is that the communication of
virtual avatar faces primarily emphasizes the conveyance of semantic
information. In light of this, the proposed scheme offers the advantages of
being highly flexible, efficient and semantically meaningful. The semantic face
compression, which allows the communication of the descriptors for artificial
intelligence based understanding, could facilitate numerous applications
without the involvement of humans in metaverse. The promise of the proposed
paradigm is also demonstrated by performance comparisons with the
state-of-the-art video coding standard, Versatile Video Coding. A significant
improvement in terms of rate-accuracy performance has been achieved. The
proposed scheme is expected to enable numerous applications, such as digital
human communication based on machine analysis, and to form the cornerstone of
interaction and communication in the metaverse.
Related papers
- 3D Vision-Language Gaussian Splatting [29.047044145499036]
Multi-modal 3D scene understanding has vital applications in robotics, autonomous driving, and virtual/augmented reality.
We propose a solution that achieves adequately handles the distinct visual and semantic modalities.
We also employ a camera-view blending technique to improve semantic consistency between existing views.
arXiv Detail & Related papers (2024-10-10T03:28:29Z) - Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation [9.67450435520651]
This paper introduces MetaFace, a novel methodology crafted for speaking style adaptation.
It is composed of several key components: the Robust Meta Initialization Stage (RMIS) for fundamental speaking style adaptation, the Dynamic Relation Mining Neural Process (NDRM) for forging connections between observed and unobserved speaking styles, and the Low-rank Matrix Memory Reduction Approach to enhance the efficiency of model optimization.
arXiv Detail & Related papers (2024-08-18T04:42:43Z) - Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters.
Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z) - GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model.
It can synthesize smooth lip dynamics while preserving the speaker's identity.
Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - Parametric Implicit Face Representation for Audio-Driven Facial
Reenactment [52.33618333954383]
We propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads.
Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models.
Our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
arXiv Detail & Related papers (2023-06-13T07:08:22Z) - Interactive Face Video Coding: A Generative Compression Framework [18.26476468644723]
We propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals.
The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression and headpose animation.
arXiv Detail & Related papers (2023-02-20T11:24:23Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z) - Towards Modality Transferable Visual Information Representation with
Optimal Model Compression [67.89885998586995]
We propose a new scheme for visual signal representation that leverages the philosophy of transferable modality.
The proposed framework is implemented on the state-of-the-art video coding standard.
arXiv Detail & Related papers (2020-08-13T01:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.