Related papers: TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

URL: http://arxiv.org/abs/2004.09862v1
Date: Tue, 21 Apr 2020 09:39:38 GMT
Title: TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning
Authors: Pengcheng Wang, Zihao Wang, Zhilong Ji, Xiao Liu, Songfan Yang and Zhongqin Wu
Abstract summary: We pose the AU recognition problem as a multi-task learning problem. The co-occurrence of the expression features and the head pose features are explored. By choosing the optimal checkpoint for each AU, the recognition results are improved.
Score: 24.365090805937083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces our approach to the EmotioNet Challenge 2020. We pose the AU recognition problem as a multi-task learning problem, where the non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head motion (the last 6 AUs) are modeled separately. The co-occurrence of the expression features and the head pose features are explored. We observe that different AUs converge at various speed. By choosing the optimal checkpoint for each AU, the recognition results are improved. We are able to obtain a final score of 0.746 in validation set and 0.7306 in the test set of the challenge.

Related papers

NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results [132.72989397564405]
This paper reviews the NTIRE 2025 challenge on real-world face restoration. The challenge focuses on generating natural, realistic outputs while maintaining identity consistency. The track of the challenge evaluates performance using a weighted image quality assessment (IQA) score and employs the AdaFace model as an identity checker.
arXiv Detail & Related papers (2025-04-20T13:00:24Z)
Visual Agents as Fast and Slow Thinkers [88.6691504568041]
We introduce FaST, which incorporates the Fast and Slow Thinking mechanism into visual agents. FaST employs a switch adapter to dynamically select between System 1/2 modes. It tackles uncertain and unseen objects by adjusting model confidence and integrating new contextual data.
arXiv Detail & Related papers (2024-08-16T17:44:02Z)
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding [3.350769246260559]
We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions. We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features. Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
arXiv Detail & Related papers (2024-07-15T21:13:28Z)
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition [5.303788012608604]
We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge. We evaluate a series of deep learning models that are representative of the major advances in SER research.
arXiv Detail & Related papers (2024-06-10T15:55:06Z)
Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering. We show that naive application of model-written decompositions can hurt performance. We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z)
Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic Reasoning Task 2023 [13.326745559876558]
We present our solution to a Multi-modal Algorithmic Reasoning Task: SMART-101 Challenge. This challenge evaluates the abstraction, deduction, and generalization abilities of neural networks in solving visuolinguistic puzzles. Under the puzzle splits configuration, we achieved an accuracy score of 26.5 on the validation set and 24.30 on the private test set.
arXiv Detail & Related papers (2023-10-10T09:12:27Z)
SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer. To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis. Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z)
Fine-Grained Hard Negative Mining: Generalizing Mitosis Detection with a Fifth of the MIDOG 2022 Dataset [1.2183405753834562]
We describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation. Our model ensemble achieved a F1-score of.697 on the final test set after automated evaluation.
arXiv Detail & Related papers (2023-01-03T13:06:44Z)
NTIRE 2022 Challenge on Perceptual Image Quality Assessment [90.04931572825859]
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA) The challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The winning method can demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2022-06-23T13:36:49Z)
Facial Action Unit Recognition With Multi-models Ensembling [0.0]
We present our method of Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition. We use improved IResnet100 as backbone. Then we train AU dataset in Aff-Wild2 on three pertained models pretrained by our private au and expression dataset, and Glint360K respectively.
arXiv Detail & Related papers (2022-03-24T12:50:02Z)
NTIRE 2021 Multi-modal Aerial View Object Classification Challenge [88.89190054948325]
We introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO and SAR imagery. We discuss the top methods submitted for this competition and evaluate their results on our blind test set.
arXiv Detail & Related papers (2021-07-02T16:55:08Z)
Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme. CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions. We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.