Multi-modal Expression Recognition with Ensemble Method
- URL: http://arxiv.org/abs/2303.10033v1
- Date: Fri, 17 Mar 2023 15:03:58 GMT
- Title: Multi-modal Expression Recognition with Ensemble Method
- Authors: Chuanhe Liu, Xinjie Zhang, Xiaolong Liu, Tenggan Zhang, Liyu Meng,
Yuchen Liu, Yuanyuan Deng, Wenqiang Jiang
- Abstract summary: multimodal feature combinations extracted by several different pre-trained models are applied to capture more effective emotional information.
For these combinations of visual and audio modal features, we utilize two temporal encoders to explore the temporal contextual information in the data.
Our system achieves the average F1 Score of 0.45774 on the validation set.
- Score: 9.880739481276835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our submission to the Expression Classification Challenge
of the fifth Affective Behavior Analysis in-the-wild (ABAW) Competition. In our
method, multimodal feature combinations extracted by several different
pre-trained models are applied to capture more effective emotional information.
For these combinations of visual and audio modal features, we utilize two
temporal encoders to explore the temporal contextual information in the data.
In addition, we employ several ensemble strategies for different experimental
settings to obtain the most accurate expression recognition results. Our system
achieves the average F1 Score of 0.45774 on the validation set.
Related papers
- Multimodal Clinical Trial Outcome Prediction with Large Language Models [30.201189349890267]
We propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction.
LIFTED unifies different modality data by transforming them into natural language descriptions.
Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions.
arXiv Detail & Related papers (2024-02-09T16:18:38Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - On Robustness in Multimodal Learning [75.03719000820388]
Multimodal learning is defined as learning over multiple input modalities such as video, audio, and text.
We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods.
arXiv Detail & Related papers (2023-04-10T05:02:07Z) - An Ensemble Approach for Multiple Emotion Descriptors Estimation Using
Multi-task Learning [12.589338141771385]
This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition.
Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face.
The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
arXiv Detail & Related papers (2022-07-22T04:57:56Z) - Emotion Recognition based on Multi-Task Learning Framework in the ABAW4
Challenge [12.662242704351563]
This paper presents our submission to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
Based on visual feature representations, we utilize three types of temporal encoder to capture the temporal context information in the video.
Our system achieves the performance of $1.742$ on MTL Challenge validation dataset.
arXiv Detail & Related papers (2022-07-19T16:18:53Z) - Multi-modal Emotion Estimation for in-the-wild Videos [40.292523976091964]
We introduce our submission to the Valence-Arousal Estimation Challenge of the 3rd Affective Behavior Analysis in-the-wild (ABAW) competition.
Our method utilizes the multi-modal information, i.e., the visual and audio information, and employs a temporal encoder to model the temporal context in the videos.
arXiv Detail & Related papers (2022-03-24T12:23:07Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - A Multi-modal and Multi-task Learning Method for Action Unit and
Expression Recognition [18.478011167414223]
We introduce a multi-modal and multi-task learning method by using both visual and audio information.
We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set.
arXiv Detail & Related papers (2021-07-09T03:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.