Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge
- URL: http://arxiv.org/abs/2407.12257v2
- Date: Fri, 26 Jul 2024 08:46:26 GMT
- Title: Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge
- Authors: Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An, Zishun Cui, Weijie Feng, Xiao Sun,
- Abstract summary: Compound Expression Recognition (CER) is vital for effective interpersonal interactions.
In this paper, we propose an ensemble learning-based solution to address this complexity.
Our method demonstrates high accuracy on the RAF-DB datasets and is capable of recognizing expressions in certain portions of the C-EXPR-DB through zero-shot learning.
- Score: 6.26485278174662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three distinct expression classification models using convolutional networks, Vision Transformers, and multiscale local attention networks. By employing late fusion for model ensemble, we combine the outputs of these models to predict the final results. Our method demonstrates high accuracy on the RAF-DB datasets and is capable of recognizing expressions in certain portions of the C-EXPR-DB through zero-shot learning.
Related papers
- Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - Compound Expression Recognition via Multi Model Ensemble [8.529105068848828]
Compound Expression Recognition plays a crucial role in interpersonal interactions.
We propose a solution based on ensemble learning methods for Compound Expression Recognition.
Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.
arXiv Detail & Related papers (2024-03-19T09:30:56Z) - Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge [11.49671335206114]
We propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.
In this study, we propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.
arXiv Detail & Related papers (2024-03-18T03:59:24Z) - Video Relationship Detection Using Mixture of Experts [1.6574413179773761]
We introduce MoE-VRD, a novel approach to visual relationship detection utilizing a mixture of experts.
MoE-VRD identifies language triplets in the form of subject, predicate, object>s to extract relationships from visual processing.
Our experimental results demonstrate that the conditional computation capabilities and scalability of the mixture-of-experts approach lead to superior performance in visual relationship detection compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-06T19:08:34Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - Learning Diversified Feature Representations for Facial Expression
Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures.
Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z) - Exploiting Emotional Dependencies with Graph Convolutional Networks for
Facial Expression Recognition [31.40575057347465]
This paper proposes a novel multi-task learning framework to recognize facial expressions in-the-wild.
A shared feature representation is learned for both discrete and continuous recognition in a MTL setting.
The results of our experiments show that our method outperforms the current state-of-the-art methods on discrete FER.
arXiv Detail & Related papers (2021-06-07T10:20:05Z) - Feature Decomposition and Reconstruction Learning for Effective Facial
Expression Recognition [80.17419621762866]
We propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition.
FDRL consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN)
arXiv Detail & Related papers (2021-04-12T02:22:45Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.