FAN-Trans: Online Knowledge Distillation for Facial Action Unit
Detection
- URL: http://arxiv.org/abs/2211.06143v1
- Date: Fri, 11 Nov 2022 11:35:33 GMT
- Title: FAN-Trans: Online Knowledge Distillation for Facial Action Unit
Detection
- Authors: Jing Yang, Jie Shen, Yiming Lin, Yordan Hristov, Maja Pantic
- Abstract summary: Leveraging the online knowledge distillation framework, we propose the FANTrans" method for AU detection.
Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences.
- Score: 45.688712067285536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to its importance in facial behaviour analysis, facial action unit (AU)
detection has attracted increasing attention from the research community.
Leveraging the online knowledge distillation framework, we propose the
``FANTrans" method for AU detection. Our model consists of a hybrid network of
convolution and transformer blocks to learn per-AU features and to model AU
co-occurrences. The model uses a pre-trained face alignment network as the
feature extractor. After further transformation by a small learnable add-on
convolutional subnet, the per-AU features are fed into transformer blocks to
enhance their representation. As multiple AUs often appear together, we propose
a learnable attention drop mechanism in the transformer block to learn the
correlation between the features for different AUs. We also design a classifier
that predicts AU presence by considering all AUs' features, to explicitly
capture label dependencies. Finally, we make the attempt of adapting online
knowledge distillation in the training stage for this task, further improving
the model's performance. Experiments on the BP4D and DISFA datasets
demonstrating the effectiveness of proposed method.
Related papers
- AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors [31.547624650827395]
Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters.
PETL provides a promising paradigm to address these challenges.
We propose a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism.
arXiv Detail & Related papers (2024-03-07T17:46:50Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers [7.725095281624494]
We evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative.
We observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry.
arXiv Detail & Related papers (2023-06-19T09:38:21Z) - Local Region Perception and Relationship Learning Combined with Feature
Fusion for Facial Action Unit Detection [12.677143408225167]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We propose a single-stage trained AU detection framework. Specifically, in order to effectively extract facial local region features related to AU detection, we use a local region perception module.
We also use a graph neural network-based relational learning module to capture the relationship between AUs.
arXiv Detail & Related papers (2023-03-15T11:59:24Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Federated Adversarial Training with Transformers [16.149924042225106]
Federated learning (FL) has emerged to enable global model training over distributed clients' data while preserving its privacy.
This paper investigates feasibility with different federated model aggregation methods and different vision transformer models with different tokenization and classification head techniques.
arXiv Detail & Related papers (2022-06-05T09:07:09Z) - Cross-subject Action Unit Detection with Meta Learning and
Transformer-based Relation Modeling [7.395396464857193]
The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences.
A transformer-based relation learning module is introduced to learn the latent relations of multiple AUs.
Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology.
arXiv Detail & Related papers (2022-05-18T08:17:59Z) - Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT)
We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature.
While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.