FG-Net: Facial Action Unit Detection with Generalizable Pyramidal
Features
- URL: http://arxiv.org/abs/2308.12380v1
- Date: Wed, 23 Aug 2023 18:51:11 GMT
- Title: FG-Net: Facial Action Unit Detection with Generalizable Pyramidal
Features
- Authors: Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing
Liu, Linjie Luo, Mohammad Soleymani
- Abstract summary: Previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora.
We propose FG-Net for generalizable facial action unit detection.
Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset.
- Score: 13.176011491885664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic detection of facial Action Units (AUs) allows for objective facial
expression analysis. Due to the high cost of AU labeling and the limited size
of existing benchmarks, previous AU detection methods tend to overfit the
dataset, resulting in a significant performance loss when evaluated across
corpora. To address this problem, we propose FG-Net for generalizable facial
action unit detection. Specifically, FG-Net extracts feature maps from a
StyleGAN2 model pre-trained on a large and diverse face image dataset. Then,
these features are used to detect AUs with a Pyramid CNN Interpreter, making
the training efficient and capturing essential local features. The proposed
FG-Net achieves a strong generalization ability for heatmap-based AU detection
thanks to the generalizable and semantic-rich features extracted from the
pre-trained generative model. Extensive experiments are conducted to evaluate
within- and cross-corpus AU detection with the widely-used DISFA and BP4D
datasets. Compared with the state-of-the-art, the proposed method achieves
superior cross-domain performance while maintaining competitive within-domain
performance. In addition, FG-Net is data-efficient and achieves competitive
performance even when trained on 1000 samples. Our code will be released at
\url{https://github.com/ihp-lab/FG-Net}
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point
Clouds [3.3888257250564364]
This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar.
The sparsity and capturing temporal-topological features of mmWave data is still a problem.
We introduce graph structure and topological features to the point cloud and propose a semantic segmentation framework.
Our model achieves mean accuracy on a custom dataset by $mathbf82.31%$ and outperforms state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-27T12:28:06Z) - Long Range Object-Level Monocular Depth Estimation for UAVs [0.0]
We propose several novel extensions to state-of-the-art methods for monocular object detection from images at long range.
Firstly, we propose Sigmoid and ReLU-like encodings when modeling depth estimation as a regression task.
Secondly, we frame the depth estimation as a classification problem and introduce a Soft-Argmax function in the calculation of the training loss.
arXiv Detail & Related papers (2023-02-17T15:26:04Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Structure-Consistent Weakly Supervised Salient Object Detection with
Local Saliency Coherence [14.79639149658596]
We propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations.
Our method achieves a new state-of-the-art performance on six benchmarks.
arXiv Detail & Related papers (2020-12-08T12:49:40Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.