Discrete Facial Encoding: : A Framework for Data-driven Facial Display Discovery
- URL: http://arxiv.org/abs/2510.01662v1
- Date: Thu, 02 Oct 2025 04:44:45 GMT
- Title: Discrete Facial Encoding: : A Framework for Data-driven Facial Display Discovery
- Authors: Minh Tran, Maksim Siniukov, Zhangyu Jin, Mohammad Soleymani,
- Abstract summary: We introduce Discrete Facial, an unsupervised, data-driven alternative of compact and interpretable dictionary of facial expressions.<n>Our system consistently outperforms both FACS-based pipelines and strong image and video representation learning models.<n>Our representation covers a wider variety of facial displays, highlighting its potential as a scalable and effective alternative to FACS for psychological and affective computing applications.
- Score: 6.096726247356906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial expression analysis is central to understanding human behavior, yet existing coding systems such as the Facial Action Coding System (FACS) are constrained by limited coverage and costly manual annotation. In this work, we introduce Discrete Facial Encoding (DFE), an unsupervised, data-driven alternative of compact and interpretable dictionary of facial expressions from 3D mesh sequences learned through a Residual Vector Quantized Variational Autoencoder (RVQ-VAE). Our approach first extracts identity-invariant expression features from images using a 3D Morphable Model (3DMM), effectively disentangling factors such as head pose and facial geometry. We then encode these features using an RVQ-VAE, producing a sequence of discrete tokens from a shared codebook, where each token captures a specific, reusable facial deformation pattern that contributes to the overall expression. Through extensive experiments, we demonstrate that Discrete Facial Encoding captures more precise facial behaviors than FACS and other facial encoding alternatives. We evaluate the utility of our representation across three high-level psychological tasks: stress detection, personality prediction, and depression detection. Using a simple Bag-of-Words model built on top of the learned tokens, our system consistently outperforms both FACS-based pipelines and strong image and video representation learning models such as Masked Autoencoders. Further analysis reveals that our representation covers a wider variety of facial displays, highlighting its potential as a scalable and effective alternative to FACS for psychological and affective computing applications.
Related papers
- PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training [32.52750192639004]
PaCo-FR is an unsupervised framework that combines masked image modeling with patch-pixel alignment.<n>PaCo-FR achieves state-of-the-art performance across several facial analysis tasks with just 2 million unlabeled images for pre-training.
arXiv Detail & Related papers (2025-08-13T10:37:41Z) - From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts [69.44297222099175]
We introduce a Mixture of Facial Experts (MoFE) that captures distinct but mutually reinforcing aspects of facial attributes.<n>To mitigate dataset limitations, we have tailored a data processing pipeline centered on two key aspects: Face Constraints and Identity Consistency.<n>We have curated and refined a Large Face Angles (LFA) dataset from existing open-source human video datasets.
arXiv Detail & Related papers (2025-08-13T04:10:16Z) - Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism [3.0274846041592864]
The Facial Action Coding System (FACS) has been used by numerous studies to investigate the links between facial behavior and mental health.<n>Despite intense efforts spanning three decades, the detection accuracy for many Action Units is considered to be below the threshold needed for behavioral research.<n>This paper proposes a new coding system that mimics the key properties of FACS.
arXiv Detail & Related papers (2025-05-30T15:06:01Z) - Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection [71.60120616284246]
We propose a novel knowledge-based prompt learning framework to explore the strong generalization capability of vision-language models for 3D mask presentation attack detection.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art intra- and cross-scenario detection performance.
arXiv Detail & Related papers (2025-05-06T15:09:23Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.<n>We propose OSDFace, a novel one-step diffusion model for face restoration.<n>Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors [71.69161292330504]
Reversible face anonymization seeks to replace sensitive identity information in facial images with synthesized alternatives.
This paper introduces Gtextsuperscript2Face, which leverages both generative and geometric priors to enhance identity manipulation.
Our method outperforms existing state-of-the-art techniques in face anonymization and recovery, while preserving high data utility.
arXiv Detail & Related papers (2024-08-18T12:36:47Z) - Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking [3.0605062268685868]
We propose an unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking.
Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets.
87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements.
arXiv Detail & Related papers (2024-06-08T10:45:38Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - SARGAN: Spatial Attention-based Residuals for Facial Expression
Manipulation [1.7056768055368383]
We present a novel method named SARGAN that addresses the limitations from three perspectives.
We exploited a symmetric encoder-decoder network to attend facial features at multiple scales.
Our proposed model performs significantly better than state-of-the-art methods.
arXiv Detail & Related papers (2023-03-30T08:15:18Z) - Coding Facial Expressions with Gabor Wavelets (IVC Special Issue) [0.0]
We present a method for extracting information about facial expressions from digital images.
A similarity space derived from this code is compared with one derived from semantic ratings of the images by human observers.
arXiv Detail & Related papers (2020-09-13T07:01:16Z) - DotFAN: A Domain-transferred Face Augmentation Network for Pose and
Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN)
DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains.
Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.