Facial Action Units Detection Aided by Global-Local Expression Embedding
- URL: http://arxiv.org/abs/2210.13718v1
- Date: Tue, 25 Oct 2022 02:35:32 GMT
- Title: Facial Action Units Detection Aided by Global-Local Expression Embedding
- Authors: Zhipeng Hu, Wei Zhang, Lincheng Li, Yu Ding, Wei Chen, Zhigang Deng,
Xin Yu
- Abstract summary: We develop a novel AU detection framework aided by the Global-Local facial Expressions Embedding, dubbed GLEE-Net.
Our GLEE-Net consists of three branches to extract identity-independent expression features for AU detection.
Our method significantly outperforms the state-of-the-art on the widely-used DISFA, BP4D and BP4D+ datasets.
- Score: 36.78982474775454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since Facial Action Unit (AU) annotations require domain expertise, common AU
datasets only contain a limited number of subjects. As a result, a crucial
challenge for AU detection is addressing identity overfitting. We find that AUs
and facial expressions are highly associated, and existing facial expression
datasets often contain a large number of identities. In this paper, we aim to
utilize the expression datasets without AU labels to facilitate AU detection.
Specifically, we develop a novel AU detection framework aided by the
Global-Local facial Expressions Embedding, dubbed GLEE-Net. Our GLEE-Net
consists of three branches to extract identity-independent expression features
for AU detection. We introduce a global branch for modeling the overall facial
expression while eliminating the impacts of identities. We also design a local
branch focusing on specific local face regions. The combined output of global
and local branches is firstly pre-trained on an expression dataset as an
identity-independent expression embedding, and then finetuned on AU datasets.
Therefore, we significantly alleviate the issue of limited identities.
Furthermore, we introduce a 3D global branch that extracts expression
coefficients through 3D face reconstruction to consolidate 2D AU descriptions.
Finally, a Transformer-based multi-label classifier is employed to fuse all the
representations for AU detection. Extensive experiments demonstrate that our
method significantly outperforms the state-of-the-art on the widely-used DISFA,
BP4D and BP4D+ datasets.
Related papers
- Attribute-Text Guided Forgetting Compensation for Lifelong Person Re-Identification [8.841311088024584]
Lifelong person re-identification (LReID) aims to continuously learn from non-stationary data to match individuals in different environments.
Current LReID methods focus on task-specific knowledge and ignore intrinsic task-shared representations within domain gaps.
We propose a novel attribute-text guided forgetting compensation model, which explores text-driven global representations and attribute-related local representations.
arXiv Detail & Related papers (2024-09-30T05:19:09Z) - ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data.
We introduce a diffusion-fueled SFR model termed $textID3$.
$textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z) - AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection [72.41427550339296]
We introduce AnyMaker, a framework capable of generating general objects with high ID fidelity and flexible text editability.
The efficacy of AnyMaker stems from its novel general ID extraction, dual-level ID injection, and ID-aware decoupling.
To validate our approach and boost the research of general object customization, we create the first large-scale general ID dataset.
arXiv Detail & Related papers (2024-06-17T15:26:22Z) - Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification [63.147482497821166]
We first explore the influence of global and local features of ViT and then propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID.
Our proposed method achieves superior performance on four object Re-ID benchmarks.
arXiv Detail & Related papers (2024-04-23T12:42:07Z) - Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person
Re-identification [78.08536797239893]
We propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules.
MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips.
We show that MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
arXiv Detail & Related papers (2023-01-02T05:17:31Z) - Global-to-local Expression-aware Embeddings for Facial Action Unit
Detection [18.629509376315752]
We propose a novel fine-grained textslGlobal Expression representation to capture subtle and continuous facial movements.
It consists of an AU feature map extractor and a corresponding AU mask extractor.
Our method validly outperforms previous works and achieves state-of-the-art performances on widely-used face datasets.
arXiv Detail & Related papers (2022-10-27T04:00:04Z) - Adaptive Local-Global Relational Network for Facial Action Units
Recognition and Facial Paralysis Estimation [22.85506776477092]
We propose a novel Adaptive Local-Global Network (ALGRNet) for facial AU recognition and apply it to facial paralysis estimation.
ALGRNet consists of three novel structures, i.e., an adaptive region learning module which learns the adaptive muscle regions based on detected landmarks.
Experiments on the BP4 and DISFA AU datasets show that the proposed approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-03T16:14:49Z) - AU-Guided Unsupervised Domain Adaptive Facial Expression Recognition [21.126514122636966]
This paper proposes an AU-guided unsupervised Domain Adaptive FER framework to relieve the annotation bias between different FER datasets.
To achieve domain-invariant compact features, we utilize an AU-guided triplet training which randomly collects anchor-positive-negative triplets on both domains with AUs.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.