Spatial-temporal Transformer for Affective Behavior Analysis
- URL: http://arxiv.org/abs/2303.10561v1
- Date: Sun, 19 Mar 2023 04:34:17 GMT
- Title: Spatial-temporal Transformer for Affective Behavior Analysis
- Authors: Peng Zou, Rui Wang, Kehua Wen, Yasi Peng and Xiao Sun
- Abstract summary: We propose a Transformer with Multi-Head Attention framework to learn the distribution of both the spatial and temporal features.
The results fully demonstrate the effectiveness of our proposed model based on the Aff-Wild2 dataset.
- Score: 11.10521339384583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The in-the-wild affective behavior analysis has been an important study. In
this paper, we submit our solutions for the 5th Workshop and Competition on
Affective Behavior Analysis in-the-wild (ABAW), which includes V-A Estimation,
Facial Expression Classification and AU Detection Sub-challenges. We propose a
Transformer Encoder with Multi-Head Attention framework to learn the
distribution of both the spatial and temporal features. Besides, there are
virious effective data augmentation strategies employed to alleviate the
problems of sample imbalance during model training. The results fully
demonstrate the effectiveness of our proposed model based on the Aff-Wild2
dataset.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Most Influential Subset Selection: Challenges, Promises, and Beyond [9.479235005673683]
We study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence.
We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses.
We demonstrate that an adaptive version of theses which applies them iteratively, can effectively capture the interactions among samples.
arXiv Detail & Related papers (2024-09-25T20:00:23Z) - Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models [36.05242956018461]
In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection.
We first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets.
We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models.
arXiv Detail & Related papers (2024-05-06T21:34:46Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA)
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z) - A Multi-task Mean Teacher for Semi-supervised Facial Affective Behavior
Analysis [15.95010869939508]
Existing affective behavior analysis method such as TSAV suffer from challenge of incomplete labeled datasets.
This paper presents a multi-task mean teacher model for semi-supervised Affective Behavior Analysis to learn from missing labels.
Experimental results on validation datasets show that our method achieves better performance than TSAV model.
arXiv Detail & Related papers (2021-07-09T05:48:22Z) - On the Loss Landscape of Adversarial Training: Identifying Challenges
and How to Overcome Them [57.957466608543676]
We analyze the influence of adversarial training on the loss landscape of machine learning models.
We show that the adversarial loss landscape is less favorable to optimization, due to increased curvature and more scattered gradients.
arXiv Detail & Related papers (2020-06-15T13:50:23Z) - Affective Expression Analysis in-the-wild using Multi-Task Temporal
Statistical Deep Learning Model [6.024865915538501]
We present an affective expression analysis model that deals with the above challenges.
We experimented on Aff-Wild2 dataset, a large-scale dataset for ABAW Challenge.
arXiv Detail & Related papers (2020-02-21T04:06:03Z) - Adversarial-based neural networks for affect estimations in the wild [3.3335236123901995]
In this work, we explore the use of latent features through our proposed adversarial-based networks for recognition in the wild.
Specifically, our models operate by aggregating several modalities to our discriminator, which is further conditioned to the extracted latent features by the generator.
Our experiments on the recently released SEWA dataset suggest the progressive improvements of our results.
arXiv Detail & Related papers (2020-02-03T16:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.