Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age
- URL: http://arxiv.org/abs/2410.24148v1
- Date: Thu, 31 Oct 2024 17:09:19 GMT
- Title: Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age
- Authors: Nouar AlDahoul, Myles Joshua Toledo Tan, Harishwar Reddy Kasireddy, Yasir Zaki,
- Abstract summary: Analyzing demographic characteristics based on images and analyzing facial expressions have several challenges due to the complexity of humans' facial attributes.
Traditional approaches have employed CNNs and various other deep learning techniques, trained on extensive collections of labeled images.
We propose vision language models (VLMs) such as generative pre-trained transformer (GPT), GEMINI, large language and vision assistant (LLAVA), PaliGemma, and Microsoft Florence2 to recognize facial attributes.
- Score: 0.1399948157377307
- License:
- Abstract: Technologies for recognizing facial attributes like race, gender, age, and emotion have several applications, such as surveillance, advertising content, sentiment analysis, and the study of demographic trends and social behaviors. Analyzing demographic characteristics based on images and analyzing facial expressions have several challenges due to the complexity of humans' facial attributes. Traditional approaches have employed CNNs and various other deep learning techniques, trained on extensive collections of labeled images. While these methods demonstrated effective performance, there remains potential for further enhancements. In this paper, we propose to utilize vision language models (VLMs) such as generative pre-trained transformer (GPT), GEMINI, large language and vision assistant (LLAVA), PaliGemma, and Microsoft Florence2 to recognize facial attributes such as race, gender, age, and emotion from images with human faces. Various datasets like FairFace, AffectNet, and UTKFace have been utilized to evaluate the solutions. The results show that VLMs are competitive if not superior to traditional techniques. Additionally, we propose "FaceScanPaliGemma"--a fine-tuned PaliGemma model--for race, gender, age, and emotion recognition. The results show an accuracy of 81.1%, 95.8%, 80%, and 59.4% for race, gender, age group, and emotion classification, respectively, outperforming pre-trained version of PaliGemma, other VLMs, and SotA methods. Finally, we propose "FaceScanGPT", which is a GPT-4o model to recognize the above attributes when several individuals are present in the image using a prompt engineered for a person with specific facial and/or physical attributes. The results underscore the superior multitasking capability of FaceScanGPT to detect the individual's attributes like hair cut, clothing color, postures, etc., using only a prompt to drive the detection and recognition tasks.
Related papers
- Robustness Disparities in Face Detection [64.71318433419636]
We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models.
Across all the datasets and systems, we generally find that photos of individuals who are $textitmasculine presenting$, of $textitolder$, of $textitdarker skin type$, or have $textitdim lighting$ are more susceptible to errors than their counterparts in other identities.
arXiv Detail & Related papers (2022-11-29T05:22:47Z) - Face Emotion Recognization Using Dataset Augmentation Based on Neural
Network [0.0]
Facial expression is one of the most external indications of a person's feelings and emotions.
It plays an important role in coordinating interpersonal relationships.
As a branch of the field of analyzing sentiment, facial expression recognition offers broad application prospects.
arXiv Detail & Related papers (2022-10-23T10:21:45Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - Real-time Emotion and Gender Classification using Ensemble CNN [0.0]
This paper is the implementation of an Ensemble CNN for building a real-time system that can detect emotion and gender of the person.
Our work can predict emotion and gender on single face images as well as multiple face images.
arXiv Detail & Related papers (2021-11-15T13:51:35Z) - Facial Emotion Recognition: A multi-task approach using deep learning [0.0]
We propose a multi-task learning algorithm, in which a single CNN detects gender, age and race of the subject along with their emotion.
Results show that this approach is significantly better than the current State of the art algorithms for this task.
arXiv Detail & Related papers (2021-10-28T11:23:00Z) - I Only Have Eyes for You: The Impact of Masks On Convolutional-Based
Facial Expression Recognition [78.07239208222599]
We evaluate how the recently proposed FaceChannel adapts towards recognizing facial expressions from persons with masks.
We also perform specific feature-level visualization to demonstrate how the inherent capabilities of the FaceChannel to learn and combine facial features change when in a constrained social interaction scenario.
arXiv Detail & Related papers (2021-04-16T20:03:30Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Learning Emotional-Blinded Face Representations [77.7653702071127]
We propose two face representations that are blind to facial expressions associated to emotional responses.
This work is motivated by new international regulations for personal data protection.
arXiv Detail & Related papers (2020-09-18T09:24:10Z) - Real-time Facial Expression Recognition "In The Wild'' by Disentangling
3D Expression from Identity [6.974241731162878]
This paper proposes a novel method for human emotion recognition from a single RGB image.
We construct a large-scale dataset of facial videos, rich in facial dynamics, identities, expressions, appearance and 3D pose variations.
Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation.
arXiv Detail & Related papers (2020-05-12T01:32:55Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.