Related papers: Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism

Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism

URL: http://arxiv.org/abs/2505.24679v1
Date: Fri, 30 May 2025 15:06:01 GMT
Title: Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism
Authors: Evangelos Sariyanidi, Lisa Yankowitz, Robert T. Schultz, John D. Herrington, Birkan Tunc, Jeffrey Cohn,
Abstract summary: The Facial Action Coding System (FACS) has been used by numerous studies to investigate the links between facial behavior and mental health.<n>Despite intense efforts spanning three decades, the detection accuracy for many Action Units is considered to be below the threshold needed for behavioral research.<n>This paper proposes a new coding system that mimics the key properties of FACS.
Score: 3.0274846041592864
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Facial Action Coding System (FACS) has been used by numerous studies to investigate the links between facial behavior and mental health. The laborious and costly process of FACS coding has motivated the development of machine learning frameworks for Action Unit (AU) detection. Despite intense efforts spanning three decades, the detection accuracy for many AUs is considered to be below the threshold needed for behavioral research. Also, many AUs are excluded altogether, making it impossible to fulfill the ultimate goal of FACS-the representation of any facial expression in its entirety. This paper considers an alternative approach. Instead of creating automated tools that mimic FACS experts, we propose to use a new coding system that mimics the key properties of FACS. Specifically, we construct a data-driven coding system called the Facial Basis, which contains units that correspond to localized and interpretable 3D facial movements, and overcomes three structural limitations of automated FACS coding. First, the proposed method is completely unsupervised, bypassing costly, laborious and variable manual annotation. Second, Facial Basis reconstructs all observable movement, rather than relying on a limited repertoire of recognizable movements (as in automated FACS). Finally, the Facial Basis units are additive, whereas AUs may fail detection when they appear in a non-additive combination. The proposed method outperforms the most frequently used AU detector in predicting autism diagnosis from in-person and remote conversations, highlighting the importance of encoding facial behavior comprehensively. To our knowledge, Facial Basis is the first alternative to FACS for deconstructing facial expressions in videos into localized movements. We provide an open source implementation of the method at github.com/sariyanidi/FacialBasis.

Related papers

Discrete Facial Encoding: : A Framework for Data-driven Facial Display Discovery [6.096726247356906]
We introduce Discrete Facial, an unsupervised, data-driven alternative of compact and interpretable dictionary of facial expressions.<n>Our system consistently outperforms both FACS-based pipelines and strong image and video representation learning models.<n>Our representation covers a wider variety of facial displays, highlighting its potential as a scalable and effective alternative to FACS for psychological and affective computing applications.
arXiv Detail & Related papers (2025-10-02T04:44:45Z)
Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond [10.015531203047598]
We present ORSANet, which introduces auxiliary multi-modal semantic guidance to disambiguate facial occlusion.<n>We also introduce facial landmarks as sparse geometric prior to mitigate intrinsic noises in FER, such as identity and gender biases.<n>Our proposed ORSANet achieves SOTA recognition performance.
arXiv Detail & Related papers (2025-07-21T09:04:29Z)
Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection [84.21257150497254]
We propose a novel knowledge-based prompt learning framework to explore the strong generalization capability of vision-language models for 3D mask presentation attack detection.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art intra- and cross-scenario detection performance.
arXiv Detail & Related papers (2025-05-06T15:09:23Z)
Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach [77.65459419417533]
We propose an automated dataset expansion technique to support semantics-oriented DeepFake detection tasks.<n>We also resort to the joint embedding of face images and labels (depicted by text descriptions) for prediction.<n>Our method improves the generalizability of DeepFake detection and renders some degree of model interpretation by providing human-understandable explanations.
arXiv Detail & Related papers (2024-08-29T07:11:50Z)
A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding [3.0605062268685868]
This paper explores the use of automated techniques to generate Action Units (AUs) for studying facial expressions. We propose an unsupervised approach based on Principal Component Analysis (PCA) and facial keypoint tracking to generate data-driven AUs.
arXiv Detail & Related papers (2024-06-13T11:40:26Z)
Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking [3.0605062268685868]
We propose an unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets. 87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements.
arXiv Detail & Related papers (2024-06-08T10:45:38Z)
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information. Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z)
Customizable Avatars with Dynamic Facial Action Coded Expressions (CADyFACE) for Improved User Engagement [0.5358896402695404]
3D avatar-based facial expression stimuli may improve user engagement in behavioral biomarker discovery. There is a lack of customizable avatar-based stimuli with Facial Action Coding System (FACS) action unit (AU) labels. This study focuses on (1) FACS-labeled, customizable avatar-based expression stimuli for maintaining subjects' engagement, (2) learning-based measurements that quantify subjects' facial responses to such stimuli, and (3) validation of constructs represented by-measurement stimulus pairs.
arXiv Detail & Related papers (2024-03-12T05:00:38Z)
Contrastive Learning of Person-independent Representations for Facial Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold. We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations. Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z)
Boosting Facial Action Unit Detection Through Jointly Learning Facial Landmark Detection and Domain Separation and Reconstruction [4.4150617622399055]
We propose a new AU detection framework where multi-task learning is introduced to jointly learn AU domain separation and reconstruction and facial landmark detection. We also propose a new feature alignment scheme based on contrastive learning by simple projectors and an improved contrastive loss.
arXiv Detail & Related papers (2023-10-08T15:49:26Z)
Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation [22.85506776477092]
We propose a novel Adaptive Local-Global Network (ALGRNet) for facial AU recognition and apply it to facial paralysis estimation. ALGRNet consists of three novel structures, i.e., an adaptive region learning module which learns the adaptive muscle regions based on detected landmarks. Experiments on the BP4 and DISFA AU datasets show that the proposed approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-03T16:14:49Z)
AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition [79.8779790682205]
We propose an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition. We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods.
arXiv Detail & Related papers (2020-12-29T03:42:04Z)
J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment. Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.