TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in
Multi-Task Learning
- URL: http://arxiv.org/abs/2004.09862v1
- Date: Tue, 21 Apr 2020 09:39:38 GMT
- Title: TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in
Multi-Task Learning
- Authors: Pengcheng Wang, Zihao Wang, Zhilong Ji, Xiao Liu, Songfan Yang and
Zhongqin Wu
- Abstract summary: We pose the AU recognition problem as a multi-task learning problem.
The co-occurrence of the expression features and the head pose features are explored.
By choosing the optimal checkpoint for each AU, the recognition results are improved.
- Score: 24.365090805937083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces our approach to the EmotioNet Challenge 2020. We pose
the AU recognition problem as a multi-task learning problem, where the
non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head
motion (the last 6 AUs) are modeled separately. The co-occurrence of the
expression features and the head pose features are explored. We observe that
different AUs converge at various speed. By choosing the optimal checkpoint for
each AU, the recognition results are improved. We are able to obtain a final
score of 0.746 in validation set and 0.7306 in the test set of the challenge.
Related papers
- Representation Learning and Identity Adversarial Training for Facial Behavior Understanding [3.350769246260559]
We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions.
We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features.
Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
arXiv Detail & Related papers (2024-07-15T21:13:28Z) - INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition [5.303788012608604]
We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge.
We evaluate a series of deep learning models that are representative of the major advances in SER research.
arXiv Detail & Related papers (2024-06-10T15:55:06Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic
Reasoning Task 2023 [13.326745559876558]
We present our solution to a Multi-modal Algorithmic Reasoning Task: SMART-101 Challenge.
This challenge evaluates the abstraction, deduction, and generalization abilities of neural networks in solving visuolinguistic puzzles.
Under the puzzle splits configuration, we achieved an accuracy score of 26.5 on the validation set and 24.30 on the private test set.
arXiv Detail & Related papers (2023-10-10T09:12:27Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - Fine-Grained Hard Negative Mining: Generalizing Mitosis Detection with a
Fifth of the MIDOG 2022 Dataset [1.2183405753834562]
We describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG)
Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation.
Our model ensemble achieved a F1-score of.697 on the final test set after automated evaluation.
arXiv Detail & Related papers (2023-01-03T13:06:44Z) - NTIRE 2022 Challenge on Perceptual Image Quality Assessment [90.04931572825859]
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA)
The challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms.
The winning method can demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2022-06-23T13:36:49Z) - Facial Action Unit Recognition With Multi-models Ensembling [0.0]
We present our method of Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition.
We use improved IResnet100 as backbone. Then we train AU dataset in Aff-Wild2 on three pertained models pretrained by our private au and expression dataset, and Glint360K respectively.
arXiv Detail & Related papers (2022-03-24T12:50:02Z) - NTIRE 2021 Multi-modal Aerial View Object Classification Challenge [88.89190054948325]
We introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR.
This challenge is composed of two different tracks using EO and SAR imagery.
We discuss the top methods submitted for this competition and evaluate their results on our blind test set.
arXiv Detail & Related papers (2021-07-02T16:55:08Z) - DeepMark++: Real-time Clothing Detection at the Edge [55.41644538483948]
We propose a single-stage approach to deliver rapid clothing detection and keypoint estimation.
Our solution is based on a multi-target network CenterNet, and we introduce several powerful post-processing techniques to enhance performance.
Our most accurate model achieves results comparable to state-of-the-art solutions on the DeepFashion2 dataset.
arXiv Detail & Related papers (2020-06-01T04:36:57Z) - Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions.
We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.