Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network
- URL: http://arxiv.org/abs/2407.11663v1
- Date: Tue, 16 Jul 2024 12:33:22 GMT
- Title: Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network
- Authors: Xiaodong Li, Wenchao Du, Hongyu Yang,
- Abstract summary: We present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition.
This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation.
- Score: 18.304164382834617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feature representations, we introduce the pre-trained large model Dinov2. 2) To adaptively extract the required features of eack task, we design a task-adaptive block that performs cross-attention between a set of learnable query vectors and pre-extracted features. 3) By proposing the AU-assisted Graph Convolutional Network(AU-GCN), we make full use of the correlation information between AUs to assist in solving the EXPR and VA tasks. Finally, we achieve the evaluation measure of \textbf{1.2542} on the validation set provided by the organizers.
Related papers
- Affective Behaviour Analysis via Progressive Learning [23.455163723584427]
We present our methods and experimental results for the two competition tracks.
We train a Masked-Auto in a self-supervised manner to attain high-quality facial features.
We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions.
arXiv Detail & Related papers (2024-07-24T02:24:21Z) - QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View [2.3982875575861677]
We present our solutions for a spectrum of automation tasks in life-saving intervention procedures within the Trauma THOMPSON (T3) Challenge.
For action recognition and anticipation, we propose a pre-processing strategy that samples and stitches multiple inputs into a single image.
For training, we present an action dictionary-guided design, which consistently yields the most favorable results.
arXiv Detail & Related papers (2024-07-18T06:55:26Z) - Composite Learning for Robust and Effective Dense Predictions [81.2055761433725]
Multi-task learning promises better model generalization on a target task by jointly optimizing it with an auxiliary task.
We find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.
arXiv Detail & Related papers (2022-10-13T17:59:16Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Identifying Auxiliary or Adversarial Tasks Using Necessary Condition
Analysis for Adversarial Multi-task Video Understanding [34.75145779372538]
We propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on.
Our novel proposed framework, Adversarial Multi-Task Neural Networks (AMT), penalizes adversarial tasks, determined by NCA to be scene recognition.
We show that our approach improves accuracy by 3% and encourages the model to attend to action features instead of correlation-biasing scene features.
arXiv Detail & Related papers (2022-08-22T06:26:11Z) - Multi-task Cross Attention Network in Facial Behavior Analysis [7.910908058662372]
We present our solution for the Multi-Task Learning challenge of the Affective Behavior Analysis in-the-wild competition.
The challenge is a combination of three tasks: action unit detection, facial expression recognition and valance-arousal estimation.
We introduce a cross-attentive module to improve multi-task learning performance.
arXiv Detail & Related papers (2022-07-21T04:07:07Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Achieving Human Parity on Visual Question Answering [67.22500027651509]
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
This paper describes our recent research of AliceMind-MMU that obtains similar or even slightly better results than human beings does on VQA.
This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task.
arXiv Detail & Related papers (2021-11-17T04:25:11Z) - Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer
Learning with Visual Concepts [20.412239939287886]
The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms.
The main objective of the VALUE challenge is to train a task-agnostic model that is simultaneously applicable for various tasks with different characteristics.
This technical report describes our winning strategies for the VALUE challenge: 1) single model optimization, 2) transfer learning with visual concepts, and 3) task-aware ensemble.
arXiv Detail & Related papers (2021-10-13T03:50:07Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Dealing with Missing Modalities in the Visual Question Answer-Difference
Prediction Task through Knowledge Distillation [75.1682163844354]
We address the issues of missing modalities that have arisen from the Visual Question Answer-Difference prediction task.
We introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline.
arXiv Detail & Related papers (2021-04-13T06:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.